Universität Stuttgart

Fakultät Informatik

Fakultät Informatik Institut für Parallele und Verteilte 
Höchstleistungsrechner Universität Stuttgart Breitwiesenstraße 20 - 22 
D-70565 Stuttgart

Clock Hierarchies: An Abstraction for Grouping and Controlling Media 
Streams

Kurt Rothermel, Tobias Helbig

Clock Hierarchies:

An Abstraction for Grouping and

Controlling Media Streams

Kurt Rothermel, Tobias Helbig

CR-Klassifikation: C.2.4, D.2.2, H.5.1

Fakultätsbericht 2/1994 Technical Report April 1994

Abstract

Synchronization plays an important role in multimedia systems at various 
levels of abstraction. In this paper, we propose a set of powerful 
abstractions for controlling and synchronizing continuous media streams 
in distributed environments. The proposed abstractions are based on a 
very general computation model, which allows media streams to be 
processed (i.e. produced, consumed or transformed) by arbitrarily 
structured networks of linked components. Further, compound components 
can be composed from existing ones to provide higher levels of 
abstractions.

The clock abstraction is provided to control individual media streams, 
i.e. streams can be started, paused or scaled by issuing the appropriate 
clock operations. Clock hierarchies are used to hierarchically group 
related streams, where each clock in the hierarchy identifies and 
controls a certain (sub)group of streams. Control and synchronization 
requirements can be expressed in a uniform manner by associating group 
members with control or sync attributes. An important property of the 
concept of clock hierarchies is that it can be combined in a natural way 
with component nesting.

1 INTRODUCTION 2

1 INTRODUCTION

Powerful programming abstractions are a prerequisite for an effective 
and efficient application development. Application-specific abstractions 
are typically provided by development platforms, often referred to as 
middle ware. In the context of multimedia, those platforms close the gap 
between the operating system and the communication system on the one 
hand and the specific needs of distributed multimedia applications on 
the other hand. The CINEMA (Configurable INtEgrated Multimedia 
Architecture) system [RBH94], which is under development at the 
University of Stuttgart, is a platform providing system services for the 
configuration of distributed multimedia applications and the 
communication and synchronization of multimedia information in 
distributed environments.

Multimedia synchronization can be considered at different levels of 
abstraction [MeES93]. In this paper, we will focus on the control and 
synchronization of  groups of continuous media streams, such as digital 
video and audio streams. Media streams themselves may be regarded at 
different abstraction levels. At the transport level, a stream usually 
originates at a single source and ends at one or more sinks. Further, 
sinks and sources are adjacent in the sense that each sink of the stream 
consumes the data produced by the stream's source. Therefore, at the 
transport level end-to-end relationships are defined between adjacent 
entities connected by a transport connection. If streams are considered 
at the application level instead, sources and sinks need not be adjacent 
at all. In general, a stream may be processed by a network of linked 
components, it may have multiple sources as well as multiple sinks, and 
each path leading from a source to a sink may involve several 
intermediate components. Consequently, at the application level an 
end-to-end relationship may cover any number of intermediate components 
as well as several transport connections at a lower level of 
abstraction.

This paper proposes programming abstractions for grouping, controlling 
and synchronizing application-level streams in distributed environments. 
Media clocks provide the basic abstraction for controlling the flow of 
media streams, i.e. by issuing clock operations the controlled streams 
can be started, paused or scaled as required. Related streams can be 
hierarchically grouped by building up so-called clock hierarchies, where 
each clock controls either an individual stream or a group of streams. 
Within clock hierarchies, two types of relationships can be defined for 
the members of a stream group, a control or sync relationship. If the 
control relationship is specified, the members of the groups are 
controlled collectively without synchronization of the streams. If the 
sync relationship is defined instead, the members of this group are 
processed (e.g. played out) synchronously.

2 RELATED WORK 3

A great advantage of the concept of clock hierarchies is that it can be 
combined with component nesting in a natural way. In order to provide 
higher levels of abstractions, more complex components, so-called 
compound components, can be composed from existing ones. The internal 
processing of a compound component is controlled and synchronized by 
means of included clock hierarchies, which are an integral part of the 
compound components. It is important to stress that this paper describes 
programming abstraction rather than the protocols and mechanisms 
implementing them. For a description of the underlying protocols we 
refer to a companion paper [RoHe94].

The remainder of the paper is structured as follows. The next section 
gives a brief overview of related work, and then the computation model 
the proposed abstractions are based upon is described in Sec. 3. We 
introduce the concept of a media clock in Sec. 4 and show how it can be 
used to control individual streams. The concept of a clock hierarchy, 
which provides the means for controlling and synchronizing groups of 
media streams, is presented in Sec. 5. While this section mainly 
considers clocks attached to sink components, Sec. 6 motivates and 
treats clocks attached to sources. In Sec. 7, we discuss how the 
proposed abstractions can be applied in the context of component 
nesting. Finally, we conclude with a brief summary.

2 RELATED WORK

As stated above, streams can be considered at different levels of 
abstraction. Abstractions for grouping and controlling transport-level 
streams are provided by the orchestration service [CCGH92]. This service 
allows for grouping streams and coordinating the flow of (flat) groups 
of streams. In particular, the streams of a group can be started and 
stopped collectively, while the flow rate of the streams is regulated 
individually. The orchestration service itself does not guarantee stream 
synchronization but offers a general regulation mechanism that can be 
used at higher layers to implement different synchronization policies.

Various abstractions for controlling groups of application-level streams 
have been proposed in the literature. Some of these proposals apply to 
non-distributed environments only (e.g. Quick- Time [Appl91] or IBM's 
Multimedia presentation manager [IBM92]), and others are tailored to 
specific configurations (e.g. ACME [AnHo91] and Tactus [DNNR92]). ACME, 
for example, is an extension of a network window system supporting 
streams of digital audio and video data. The clients of the ACME server 
use the abstraction of a logical time system to control and synchronize 
the output of a (flat) group of ropes.

3 COMPUTATION MODEL 4

The Multimedia System Services proposed by the Interactive Multimedia 
Association (IMA) [IMA93] are based on a very general computation model 
and provide a rich set of abstractions for grouping and controlling 
media streams. The purpose of these services is to provide an 
environment in which a heterogeneous set of multimedia computing 
platforms cooperates to support distributed, interactive multimedia 
applications dealing with synchronized, time-based media. In this 
environment, the abstraction of a group is used to group related media 
streams. Group objects, which may include other group objects, provide 
an interface for controlling the streams belonging to this group. This 
means that an entire group of streams can be started, paused or scaled 
by issuing single operation at the group interface. However, the streams 
of a group are not synchronized. In the current proposal, stream 
synchronization is not yet integrated in the group mechanism. Moreover, 
component nesting is not supported.

3 COMPUTATION MODEL

In this section, we briefly sketch the computation model of CINEMA (for 
more details see [RBH94]). The major concepts of this model are media 
streams, components, ports, links, sessions and clocks.

A continuous media stream is defined to be a sequence of data units, 
each of which is associated with a media time stamp (e.g. see [Herr91]). 
Components are active entities that process continuous media streams in 
various ways. We distinguish between source components, which produce 
media streams, sink components, which consume media streams, and 
intermediate components, which act as both producers and consumers. 
Components are associated with typed ports. While a producer writes 
stream data to its output ports, a consumer reads stream items from its 
input ports.

Applications are configured by defining links between input and output 
ports of components. An example configuration consisting of one 
intermediate component and three source and sink components is shown in 
Fig. 1. This configuration mechanism has proven to be powerful and hence 
can be found in various other architectures as well (e.g. Conic project 
[MKS89], IMA [IMA93], Quicktime [Appl91], SUMO project [CBRS93]).

While link objects are applied to define the topology of applications, 
sessions are the abstraction for resource allocation. Media streams can 
be processed and communicated only after the corresponding sessions have 
been established. A session may comprise multiple sink and source 
components and any number of intermediate components. QoS of a session 
is specified

4 MEDIA CLOCKS 5

at the session end points, i.e. at the sink components. For controlling 
the flow of media streams an extra abstraction, so-called media clocks, 
is provided. Media clocks are used to start, pause, or scale media 
streams.

CINEMA supports the nesting of components. In other words, basic 
components can be composed to build more complex components, called 
compound components. Compound components may again be constituents of 
other components, i.e. arbitrary levels of nesting are possible. 
Compound components provide the means for building higher levels of 
abstraction on the basis of existing components.

The computation model described above is rather general and has various 
similarities with other architectures (e.g. IMA [IMA93] or SUMO 
[CBRS93]). Therefore, the concepts presented in the remainder of the 
paper are not only relevant in the CINEMA context but are applicable in 
a rather broad scope.

4 MEDIA CLOCKS

The temporal dimension of continuous media streams is defined by 
so-called media time systems. The media time system associated with a 
stream is the temporal framework to determine the media time of the 
stream's data units. In CINEMA, media time systems are provided by media 
clocks (or clocks for short). A clock C is defined as follows:

C ::= ( R, M, T, S )

The clock attributes have the following meaning:

Figure 1 : An Example Application

Component

Link Session

4 MEDIA CLOCKS 6

- R determines the ratio between real time and media time: R time units 
in media time correspond to 1 second in real time.

- M is the start value of the clock in media time, i.e. the value of the 
clock at the first clock tick.

- T is the start time of the clock in real time, i.e. the real time of 
the first clock tick.

- S determines the speed of the clock: S*R time units in media time 
correspond to one second in real time. Consequently, media time 
progresses in normal speed if S equals 1. A speed greater than 1 causes 
the clock to move faster, a speed less than 1 causes it to progress 
slower, and a negative speed causes it to move backwards.

It should be noted that the temporal dimension of stored media is 
inherently bound, i.e. there exists a lower and upper bound given by the 
media time of the first and last stream data unit. In other words, media 
time for a given stream is only defined in a certain interval. The 
mechanisms required to ensure that clock values stay within the defined 
time range are beyond the scope of this paper.

Media time systems are a general concept to dimension media time in 
arbitrary ways. For the following example, assume a (stored) video 
stream with a rate of 25 data units per second. If ratio R = 25, media 
time corresponds to a frame sequence number, e.g. if M = 5, then stream 
processing is started with the 5th frame in the stream provided the 
lower bound of its temporal dimension is 1. If media time is counted in 
milliseconds instead, R is set to 1000. In each case, ratio R defines 
the ?normal? speed of media time, while attribute S can be used to speed 
up or slow down the progress of media time.

A clock relates media time to real time as shown in Fig. 2. Therefore, 
after a clock has been started, media time (m) can be derived from real 
time (t):

Clocks are the basic abstraction for controlling the flow of media 
streams. As will be seen below, clock objects provide methods for 
starting, pausing, or scaling streams. In CINEMA, clocks may be attached 
to source and sink components, but never to intermediate ones. As in 
CINEMA stream processing can take place in arbitrary networks of 
interconnected components, a stream may consist of a number of 
substreams and may originate form multiple sources and

m M S R t T ?
( ) ?
+
=


4 MEDIA CLOCKS 7

end at multiple sinks. Those complex streams are controlled by 
manipulating the clocks at the sinks and sources.

While clocks at source components are optional, they are mandatory at 
sink components. A clock attached to a sink component controls the 
temporal progress of all data streams processed (e.g. played out) by 
this component. This is expressed more precisely by the so-called clock 
condition: a data unit having media time m is processed at real time t 
only if the controlling clock is ticking and its value equals m at time 
t. Conceptually, this means that the presentation of a stream is 
started, paused or scaled when the controlling clock is started, halted 
or the clock speed is changed, respectively. The semantics of clocks at 
source components will be introduced later.

As pointed out above, media time progresses relative to real time. In 
CINEMA, real time is taken either from a local system clock, a global 
clock (e.g. see NTP [Mill89]) or is derived from the temporal behavior 
of a given output device. Clearly, a media clock based on the timing of 
an output device advances in conformance with the device's natural rate. 
Those clocks are called master clocks.

Below, the most important clock operations for controlling streams are 
listed. A clock may enter two states, ticking (and thus advancing) or 
silent (and thus not advancing). The only clock operations that cause 
state transitions are Start and Halt. The former moves the clock from 
silent to ticking, while the latter causes the reverse state transition.

Start(M) This operation starts the clock at media time M. By starting 
the clock the controlled stream(s) are started. (Clock attribute T is 
set to the real time at which the clock is actually started).

Halt(M) This operation halts the clock when it reaches clock value M, 
i.e. the

Figure 2 : Mapping Media Time To Real Time

media time

real time 1 sec

1/R M

T


4 MEDIA CLOCKS 8

stream(s) controlled by this clock are paused. A halted clock can be 
started again by operation Start.

Prepare(M) This operation prepares the event of starting the clock at 
media time M. After Prepare has been performed, the clock can be started 
immediately when Start is issued. To achieve this, Prepare preloads the 
buffers along the communication paths of the controlled stream(s). If 
this operation is not invoked, preloading is done implicitly as part of 
Start.

Clear() This operation clears the internal buffers associated with the 
controlled stream(s).

Scale(M,S) The default value of the clock speed equals 1. This operation 
changes the speed of the clock to S when media time M is reached, i.e. 
it scales the stream(s) controlled by the clock.

Lock(O) This operation locks the clock for propagated operations of type 
O. This operation is only applied in the context of clock hierarchies.

Unlock(O) This operation unlocks the clock for propagated operations of 
type O.

In the simple scenario shown in Fig. 3, clock C controls the 
presentation of a 25 frames/sec video stream.

1 C.Start(15) 2 C.Scale(3000,2) 3 C.Halt(5000)

If we assume that clock attribute R equals 25, then play out is started 
with frame 15, the play out rate is doubled when the presentation 
reaches frame 3000, and the presentation is halted after frame 5000 has 
been played out.

Figure 3 : Controlling a Video Stream

C
video stream

video source video sink

5 CLOCK HIERARCHIES 9

5 CLOCK HIERARCHIES

In this section, we will introduce the notion of a clock hierarchy, 
which is the basic abstraction for grouping media streams, controlling 
groups of streams, and stream synchronization. The principle idea of 
this concept has been introduced in [RoDe92].

Related media streams may be grouped by linking clocks in a hierarchical 
fashion. Remember that a clock attached to a component controls all 
streams processed by this component. A number of streams can be grouped 
by linking their controlling clocks to a common clock, which then 
controls the entire group. Stream groups can be grouped again to groups 
at a higher level simply by linking their controlling clocks to the same 
clock. In the example given in Fig. 4, clock C6 controls streams S1 and 
S2, while C7 controls S4 and S5. C8 controls the subgroups represented 
by C6 and C7 as well as stream S3, and thus all streams in the given 
scenario can be started, halted or scaled collectively by means of this 
clock. Since CINEMA supports arbitrarily structured clock hierarchies, 
any type of hierarchical grouping of media streams is possible.

A clock operation issued at a clock not only affects this clock but the 
entire (sub)hierarchy of this clock. Conceptually, an operation called 
at a clock is propagated in a root-to-leaf direction through the clock's 
(sub)hierarchy, where it is performed at every clock in this hierarchy. 
That is, an operation invoked at a clock is not only performed at this 
clock but also at every descendant clock in the hierarchy. In general, 
clock operations can be issued at every level of the clock hierarchy. If 
operation Start is issued at C6 in the example depicted in Fig. 4, this 
operation is propagated to C1 and C2, which causes streams S1 and S2 to 
be started. All streams

Figure 4 : Grouping Streams

C1
S1

C2
S2

C3
S3

C4
S4

C5
S5

C6

C8

C7


5 CLOCK HIERARCHIES 10

in the depicted scenario are started if Start is invoked at C8 instead. 
As will be seen in Sec. 7, propagation is a prerequisite for component 
nesting. Compound components may contain clock subhierarchies which are 
invisible for the component's outside world.

In some scenarios, it is desirable to lock clock subhierarchies in order 
to prevent propagation. For that purpose, clocks may be locked and 
unlocked. If a clock is locked, propagation of operations issued at 
ancestor clocks does not take place in the clock's (sub)hierarchy. Note 
that only operations propagated from a locked clock's ancestors are 
locked out, while all operations issued at the locked clock itself or 
one of its descendant clocks are performed and affect the hierarchy in 
the usual way. Propagation is enabled again only when the clock is 
unlocked.

Locking is done in an operation-specific manner. Each lock is associated 
with a certain type of clock operation and only locks out operations of 
this type. In other words, a lock defines an operation-specific filter.

Locking is especially useful in those scenarios, where multiple users 
are involved in the same application and hence clock hierarchies 
typically cover several user domains. Here, locking provides a means to 
shield clock subhierarchies located in a given user domain from 
propagated clock operations originated in some other user domain. 
Consequently, by locking clocks a user can dynamically control which 
types of propagated operations may influence the data streams in his or 
her domain.

Clocks may be linked in two different ways: a link may establish either 
a control or a synchronization relationship between two clocks. A 
control relationship between two clocks enables the propagation of clock 
operations without synchronizing the two clocks. Typically, control 
relationships are defined in settings, where groups of streams are to be 
controlled collectively and a rather loose temporal coupling of the 
grouped streams is sufficient. A synchronization relationship goes a 
step further. In addition to propagation, it ensures that the involved 
clocks progress in a synchronized manner.

In CINEMA, stream synchronization is specified by means of sync 
relationships between clocks. From the clock condition introduced in the 
previous section directly follows that two streams are synchronized if 
their controlling clocks are synchronized. In the example shown in Fig. 
4, streams S1 and S2 are played out synchronously if C1 and C2 are 
synchronized. This synchronization requirement can be specified by a 
sync relationship between C1 and C6 as well as one between C2 and C6. An 
alternative way to express the same is to define a sync relationship 
directly between C1 and C2.

5 CLOCK HIERARCHIES 11

Clocks provide individual media time systems, which may relate to each 
other in various ways. Clock synchronization and propagation of clock 
operations (as will be seen below) is done on the basis of so-called 
reference points. A reference point defines the temporal relationship of 
two media time systems. More precisely, reference point [C1 : P1, C2 : 
P2] defines that media time P1 in C1's time system corresponds to media 
time P2 in C2's time system, which means that P1 and P2 relate to the 
same point in real time (see Fig. 5). Given this reference point, media 
time can be transformed from one to the other time system as follows:

After having introduced the basic principles, we can now take a closer 
look at clock hierarchies. A clock hierarchy is a directed tree 
structure, where the nodes are clocks and the edges represent control or 
sync relationships between clocks. The same hierarchy may contain 
control as well sync edges. Each edge is associated with the following 
attributes:

- Type of the edge, which is either control or sync.

- A reference point which defines the temporal relationship between the 
clocks linked by this edge.

- A delay attribute that specifies how long an operation propagated 
along this edge is to be delayed. In the example shown in Fig. 4 streams 
S4 and S5 are started 3 seconds later than the other streams if the 
Start operation is delayed by 3 seconds while propagated from C8 to C7. 
Obviously, the provision of this delay attribute enhances the 
flexibility of our scheme substantially. For the sake of simplicity, we 
will assume a zero delay in the following examples.

Figure 5 : Transforming Media Time

m2 m1 P1 ?
( ) R2
R1
------ P2 +
?
=

C1

C2

P1

P2

[C1: P1, C2 : P2]

5 CLOCK HIERARCHIES 12

Before describing the semantics of control and sync edges more 
precisely, we have to introduce function Trans(Ci, Cj, mi). In a given 
clock hierarchy, this function transforms media time mi from Ci's to 
Cj's time system according to the equation above.

5.1  Control Relationship

The semantics of a control edge that is directed from a clock, say C1, 
to another clock, say C2, and is associated with a reference point RP is 
defined by the following rules:

1. Each clock operation issued at C1 is propagated to C2's subhierarchy 
provided C2 is unlocked1.

2. Whenever a clock operation is propagated, its media time arguments 
are automatically transformed from C1's to C2's media time according to 
reference point RP. That is, argument m in a propagated operation is 
transformed to Trans(C1, C2, m)

3. Each clock operation can be issued at any clock in the control 
hierarchy.

4. AStart operation issued at clock C2 may be performed immediately 
independent of C1's value or state (ticking or silent). That is, 
C2.Start(m) starts C2 immediately with initial clock value m.

It is important to point out that control hierarchies only allow for a 
very loose coupling of streams. Although a control hierarchy includes 
reference points defining the temporal relationship between streams, 
this information is not used to keep the controlled streams 
synchronized. Reference point information is considered only when clock 
operations are propagated. In particular, it is used to automatically 
transform operation arguments from one to another media time system. For 
example, when a group of streams is started, the stream's media start 
times conform to the reference points defined in the corresponding 
control hierarchy. However, after a hierarchy has been started, its 
clocks may drift out of synchronization in an uncontrolled manner. 
Clocks in a hierarchy may drift, for example, if they are based on 
different physical time systems (e.g. system clocks or device-internal 
clocks). Moreover, in control hierarchies, each clock may be manipulated 
without considering the state and value of the parent clock. For

1No guarantee is given that a clock operation and its propagated ones 
are performed at the same point in real time. However, they are 
performed at ?approximately? the same time; what this means in practice 
mainly depends on the underlying implementation of the control 
mechanism.

5 CLOCK HIERARCHIES 13

example, two different subhierarchies of the same hierarchy may be 
scaled in different ways, or clocks in the hierarchy may be halted and 
continued at any later time with arbitrary start values.

Due to the fact of potentially drifting clocks, for operations Halt and 
Scale different semantics are conceivable. If, for example, C.Halt(Now) 
is performed, then all streams controlled by the clocks in C's 
subhierarchy are halted immediately because Now - per definition - 
corresponds to the current time in each media time system. If C.Halt(30) 
is specified instead, the different clocks may reach the equivalent of 
30 in their media time systems at different points in real time. One 
reasonable semantic of the operation is to pause all streams when the 
first clock reaches the given halting time. Due to space limitations, a 
detailed discussion of this subject is out of the scope of this paper.

5.2  Sync Relationship

The semantics of a sync edge that is directed from a clock, say C1, to 
another clock, say C2, and that is associated with a reference point RP 
is defined by the following rules:

1. Each clock operation issued at C1 is propagated to C2's subhierarchy 
provided C2 is unlocked.

2. If C2 is ticking, both clocks C1 and C2 are progressing in a 
synchronized manner, where the sync relationship is defined by reference 
point RP. More precisely: Assume that ISet denotes the set of real time 
intervals during which C2 is in the ticking state. Then C1 and C2 are 
defined to be synchronized1 if

" I ? ISet " t ? I : C2(t) = m2 fi C1(t ) = Trans(C2, C1, m2), where 
C(t) denotes the value of C at time t.

3. Except Scale each clock operation can be issued at every clock in the 
sync hierarchy. Scale can be issued at the root clock only, i.e. only 
the entire hierarchy can be scaled.

4. Operation Start can be issued at C2 only if C1 is in the ticking 
state. In order to ensure clock synchronization, the start of C2 has to 
be synchronized with the progress of C1's

1Of course, in practice, clocks can and need not be synchronized 
exactly. Thus, in CINEMA an upperbound for a tolerable skew can be 
specified at each sync edge.

5 CLOCK HIERARCHIES 14

media time: C2.Start(m) is delayed until C1's clock value equals 
Trans(C2 ,C1, m). An alternative way of starting C2 is to specify start 
time Now in operation Start. In this case C2 is started immediately, say 
at real time t, with clock value Trans(C1 , C2, m1), where m1 is C1's 
clock value at time t.

5. A sync hierarchy may contain at most one master clock, which must be 
the root of the hierarchy.

Sync hierarchies are a general and very powerful concept to specify 
arbitrary synchronization requirements between media streams. The 
structure of the sync hierarchy specifies which streams have to be 
synchronized, while the reference points in the hierarchy define how 
streams have to synchronized, i.e. how the temporal dimension of the 
streams relate to each other.

The system guarantees that all streams controlled by the clocks in the 
sync hierarchy are processed (e.g. played out) in a synchronous manner. 
Processing is started by issuing operation Start at the root clock of 
the sync hierarchy. A locked subhierarchy can be started later by 
issuing Start at the subhierarchy's root clock. The start of this 
subhierarchy is performed in conformance with the temporal constraints 
specified by the entire sync hierarchy. The same holds if a subhierarchy 
is halted and started once again at a later point in time. As will be 
seen later, sync (and control) hierarchies may dynamically grow and 
shrink even if clocks are ticking. This feature together with the 
capability of locking, halting and starting individual subhierarchies is 
very important in interactive applications, especially in those, where 
multiple users with their individual needs participate in the same 
(CSCW) application.

5.3  Example

Fig. 6 shows a simple tele cooperation scenario with two users. Subject 
to the cooperation is an experiment shown on video V2. We assume that 
there exist extra speech channels that allow the users to talk to each 
other. The two users commonly view V2 and discuss the experiment while 
they follow the presentation. To ensure that both users see the same 
information at the same time, V2 must be played out synchronously at 
both user sites. Besides V2, user 1 views video V1, which shows the same 
experiment from a different perspective. Consequently, V1 and V2 are to 
be synchronized. User 2 additionally views video V3, which shows a 
similar experiment. Since the two experiments roughly correspond to each 
other in their temporal dimension, V1

5 CLOCK HIERARCHIES 15

and V3 are grouped by a control relationship.We assume that media time 
500 in V3 corresponds to media time 5 in V2.

The presentation of all video streams can be started by issuing Start at 
clock C5. Moreover, this clock can be used to collectively scale, pause 
and restart the entire configuration. User 1 may pause V1 or V2 by 
halting C1 or C2, respectively. Halted clocks may be continued in a 
synchronized fashion, i.e. after restart of C2, for example, the 
presentation of V2 is not only synchronized with V1 but also with V2's 
presentation at the site of user 2.

Since C3 and C4 are linked with a control edge, V3 can be scaled, paused 
and restarted at any position independent of V1's and V2's state of the 
presentation. So, the presentation V3 can be adjusted manually as 
needed. At user site 2, halting C3 implies pausing V3's presentation. If 
this is to be avoided, C4 has to be locked for Halt operations. Note 
that Scale operations issued at C5, for instance, are then still 
propagated to C4.

If another user desires to join the scenario while cooperation already 
takes place between user 1 and user 2, the clock hierarchy has to be 
extended dynamically. Assume that the new user needs to view V2 only. 
After the corresponding session has been established, the clock, say C6, 
controlling V2's presentation at the site of the new user is linked by 
means of a sync edge to clock C5. When the new user is ready to 
participate in the cooperation, it issues C6.Start(- Now) to start V2's 
presentation synchronous to the ongoing presentations at the other 
sites.

Figure 6 : A Simple Telecooperation Scenario

C1

C2

C3

C4

V1

V2

V3

sync [5,5]

sync [5,5] sync [5,5]

cntr[5,500]

User1

User2

C5


6 CLOCKS AT SOURCE COMPONENTS 16

When a user desires to leave the scenario, the clocks controlling his or 
her streams have to be removed from the clock hierarchy.

6 CLOCKS AT SOURCE COMPONENTS

So far, we only considered clocks attached to sink components. The mixer 
scenario illustrated in Fig. 7 gives the motivation for having clocks 
attached to source components also. In general, substreams S1, S2 and S3 
may have individual start values. For example, if three different 
subsequences of a stored video clip are to be mixed together, the start 
values differ from sequence to sequence. However, with a clock at the 
sink component only, it is impossible to specify individual start values 
for multiple sources. The solution to this problem is obvious, a clock 
is attached to each source component, which then can be started with an 
individual start value.

As mentioned earlier, clocks at source components are optional. In a 
configuration without source clocks, start of processing of source 
components is implicitly triggered by starting the corresponding sink 
clock, where the start value is determined at the sink clock. However, 
as soon as a clock is attached to a source component, processing must be 
enabled explicitly by starting the attached clock. It is important to 
point out, that Start issued at a source clock only enables start of 
processing rather than starting the clock immediately. The time when the 
clock is actually started is mainly determined by the underlying control 
and communication protocols.

Figure 7 : Clocks at Source Components

C1

C4
C2
cntr [20,80]

C3

sinks

source

cntr [0,20]

S1

S2

S3


6 CLOCKS AT SOURCE COMPONENTS 17

Like sink clocks, source clocks may be nodes in clock hierarchies. In 
contrast to sink clocks, however, source clocks may never be involved in 
a sync relationship. This is due to the fact that synchronizing a source 
clock with some other clock makes no sense with regard to stream 
synchronization or even is impossible in various cases.

In the scenario of Fig. 7, clock C4 is the root of the control 
hierarchy. When Start is issued at C4, this operation is propagated to 
clocks C1 and C2. During propagation, the specified start value is 
transformed according to the reference points associated with the 
control edges. If the start value specified at C4 is 0, clocks C1 and C2 
are started with values 20 and 80, respectively. Since clock C3 is not 
part of the control hierarchy, it has to be started explicitly. For 
example, it may be started later when the application decides to add S3.

The scenario in Fig. 8 combines sync and control edges. Assume that 
streams S1, S2 and S3 are (stored) video streams with a rate of 25 
frames/sec and that clock attribute R = 25 for each clock. Further 
assume that media times 20, 80 and 100 of S1, S2 and S3, respectively, 
correspond to the same point in real time. The depicted configuration 
mixes S1 and S2 and synchronizes the output of the mixer with S3. The 
entire configuration is controlled by clock C3, i.e. the whole 
processing can be started, paused, scaled by issuing the corresponding 
operations at clock C3.

1 C3.Start(0) 2 C3.Scale(2000,-2) 3 C3.Halt(0)

Figure 8 : A Scenario with Control and Sync Edges

C1

C3

C4
sync [0,100] C2
cntr [20,80]

cntr [0,20]

S1

S2

S4

S3


7 NESTING OF COMPONENTS 18

The presentation is started at media time 0, which corresponds to start 
values 20, 80 and 100 at C1, C2 and C4, respectively. After 2000 frames 
have been played out, the presentation is continued in reverse order and 
double speed.

Looking at the above scenario we can indicate two points in the 
configuration, where stream synchronization is required. Not only S3 and 
S4 have to be synchronized but also S1 and S2 as mixing must be done on 
the basis of matching media times. That is, the media time systems of S1 
and S2 must be related to the time system of S4, and the mixer must 
preserve this relationship in order to enable synchronization of S3 and 
S4. Depending on the type of intermediate component, the time systems of 
its input and output streams may or may not be related. This subject is 
discussed in more detail in a longer version of this paper.

7 NESTING OF COMPONENTS

In many areas, nesting has turned out to be a very powerful concept for 
building higher levels of abstractions. As mentioned earlier, in CINEMA 
more complex components can be composed from other components just by 
linking input and output ports. Compound components again may be used to 
build other compound components on even higher levels of abstraction, 
i.e. arbitrary nesting levels are supported.

In the context of synchronization, nesting means that clock hierarchies 
may be defined within compound components and thus remain invisible for 
the components' outside world. A clock hierarchy of a compound component 
is defined at the time the component is composed and specifies 
synchronization and control relationships between the streams processed 
by this component. In particular, the internal clock hierarchy of a 
component specifies sync and control edges between the clocks defined 
within this component. In addition to the clocks attached to its 
internal components, a compound component may also contain unattached 
clocks.

A compound component may contain one or more clock hierarchies. (Note 
that a hierarchy can consist of a single clock only.) The roots of the 
internal clock hierarchies are exported and thus become visible to the 
component's outside world. The exported clocks are attached to the 
component and are used to control the component's stream processing, 
i.e. they are used to start, pause or scale the streams processed by the 
component. Of course, exported clocks may again be involved in clock 
hierarchies at higher levels of abstraction.

8 SUMMARY 19

The compound component shown in Fig. 9 provides the abstraction of a 
television set, capable of playing out a video stream and two audio 
streams in a synchronized fashion. The shown component contains two 
basic components, a video decompression component (D) and sink component 
implementing a video output window (W). In addition, it includes another 
compound component, which consists of two filter components (F) and two 
speaker components (S). The nested compound component provides the 
abstraction of an audio output device, whose operation is controlled by 
clock C2. The TV component exports clock C1, which is used to start, 
pause or scale the audio-visual output.

In summary, compound components may contain arbitrary complex clock 
hierarchies, which are invisible from the user's point of view. The 
operations issued at an exported clock are propagated through the clock 
hierarchy and thereby control the internal processing of the exporting 
component.

8 SUMMARY

The abstractions proposed in this paper provide for controlling and 
synchronizing groups of continuous media streams. Clock hierarchies can 
be used to specify nested groups of streams, where each clock in the 
hierarchy identifies and controls a certain (sub)group of streams. By 
means of control and sync edges in clock hierarchies, an application can 
specify its individual control and synchronization needs in an uniform 
way. The capability of locking subhierarchies as well as the possibility 
of dynamically growing and shrinking clock hierarchies are important 
features in the context of interactive applications, especially in those 
supporting collaborative

Figure 9 : Nested Components

C2

C1
Video

Audio 1

Audio 2

D

F

F

W

S

S

sync

sync

9 REFERENCES 20

work. Clock hierarchies in conjunction with component nesting provide a 
powerful means for the simple composition of complex components at 
higher levels of abstraction. As the computation model underlying the 
proposed abstractions is very general and has various similarities with 
others, the results reported in this paper are applicable in a rather 
broad scope.

The reported work has been conducted in the context of the CINEMA 
project. The implementation of CINEMA is in progress, and an early 
version of a synchronization manager, supporting a limited set of 
configurations, is already operational. Future work will be to complete 
this implementation.

Finally, we need to gain more practical experience with the proposed 
abstractions. Although the abstractions have been applied to model a 
great variety of application scenarios, we need to conduct extensive 
experimentation with applications in the field to verify the practical 
value of the work reported here.

9 REFERENCES

[Appl91] Apple Computer Inc., Cupertino, CA, USA. Quick Time Developer's 
Guide, 1991.

[AnHo91] Anderson, D.P.; Homsy, G.: A continuous media i/o server and 
its synchronization mechanism. In: IEEE Computer, 24 (10), pp. 51 -57, 
October 1991.

[CBRS93] Coulson, G.; Blair, G.S.; Robin, Ph.; Shepherd, D.: Extending 
the Chorus Micro-Kernel To Support Continuous Media Applications. In: 
Proc. 4th International Workshop on Network and Operating System Support 
for Digital Audio and Video, November 1993.

[CCGH92] Campbell, A.; Coulson, Garciß, F.; Hutchinson, D.: A Continuous 
Media Transport and Orchestration Service. In: Proc. SIGCOMM `92, August 
1992.

[DNNR92] Dannenberg, R.B.; Neuendorffer, T.; Newcomer, J.M.; Rubine, D.: 
Tactus: Toolkit-Level Support for Synchronized Interactive Multimedia. 
In: 3rd International Workshop on Network and Operating System Support 
for Digital Auido and Video, November 1992.

[Herr91] Herrtwich, R.G.: Time Capsules: An Abstraction for Access to 
Continuous Media Data. In: The Journal of Real-Time Systems, Kluwer 
Acadamic Publishers, 1991, pp. 355 - 376.

[IBM92] IBM Corporation: Multimedia Presentation Manager Programming 
Reference and Programming Guide 1.0, IBM Form: S41G-2919-00 and 
S41G-2920-00,

9 REFERENCES 21

March 1992.

[IMA93] IMA Multimedia System Services, Version 1.0 (contributors: 
Hewlett-Packard Comanpy, International Business Machines Corporation and 
SunSoft Inc.), available via ftp from ibminet.awdpa.ibm.com, July 1992.

[MeES93] Meyer, Th.; Effelsberg, W.; Steinmetz, R.: A Taxonomy on 
Multimedia Synchronization. In: 4th International Workshop on Future 
Trends of Distributed Computing Systems, September 1993.

[Mill89] Mills, D.L.; Internet Time Synchronization: The Network Time 
Protocol. Internet Requests for Comments No. 1129 PiFC/1129, 1989.

[MKS89] Magee, J.; Kramer, J.; Sloman, M.: Constructing Distributed 
Systems in Conic. In: IEEE Transactions on Software Engineering, Vol. 
15, No. 6, June 1989.

[RBH94] Rothermel, K.; Barth, I.; Helbig, T.: CINEMA: An Architecture 
for Configurable Distributed Multimedia Applications. Technical Report 
3/94, University of Stuttgart, April 1994.

[RoDe92] Rothermel, K.; Dermler, G.: Synchronization in Joint-Viewing 
Environments. In: Proc. 3rd International Workshop on Network and 
Operating System Support for Digital Audio and Video, November 1992.

[RoHe94] Rothermel, K.; Helbig, T.: Protocols for Synchronizing 
Application-Level Streams. Technical Report, University of Stuttgart, 
1994 (in preparation).