Representing Time in Multimedia-Systems Thomas Wahl, Kurt Rothermel University of Stuttgart IPVR Breitwiesenstr. 20 - 22 FRG-70565 Stuttgart phone: FRG 711 7816 385 wahl@informatik.uni-stuttgart.de Keywords multimedia, synchronization, presentation, temporal model, specification, multimedia documents Abstract As multimedia systems deal with a variety of temporally interrelated media items, synchronization is an important issue in those systems. One part of synchronization is the representation of temporal information. In contrast to traditional computing tasks, multimedia imposes new requirements on the representation of time. Specifically, a fine-grained and a flexible temporal model is required. Therefore, a number of temporal models have been suggested by various authors. However, there is not any temporal model that has been agreed on for multimedia. This paper evaluates and classifies a selection of the most common existing models applying fundamental statements of the time theory and temporal logic. Learning from the deficits of the existing models, a new temporal model based on interval operators is proposed for multimedia systems. 1. Introduction Multimedia systems integrate a variety of media with different temporal characteristics, e.g. time dependent media, such as video, audio or animation, and time independent media, such as text, graphics and images [Stei90]. In monomedia environments, all media show the same basic temporal behavior. Time does not need any particular attention. Now with the arising multimedia systems, various temporal interrelations between media items become more and more important. Assuring the correct temporal appearance of the media items is called synchronization. The issue of synchronizing is twofold. First, the temporal appearance including the interrelations of presentation items have to be specified. The temporal specification has to be represented for re- 1. Introduction 2 viewing by the user, presentation planning by the system and storing purposes. Secondly, the multimedia system has to guarantee the temporal constraints when presenting the media items. This is done by providing sufficient resources and real-time processing [BDH+93]. This paper focuses on the first issue of representing time in multimedia environments. The representation of time has been examined in the context of parallel computing. Several temporal models have been developed, e.g. CSP [Hoar78], [Hoar85] and path expressions [CaHa74]. When applying the models for multimedia, it is observed time is very coarse-grained in those models. To elaborate this, look at a coarse-grained temporal model allowing processes to be `sequential' or `parallel'. Let us examine two multimedia scenarios to determine the interrelations of the presentation items. The first scene consists of a video that is presented simultaneously with a corresponding audio. The second scene comprises a video that fades over to a subsequent video. Both scenes describe parallel actions because both include temporal intervals during which two actions are active. These types of parallelism cannot be distinguished in the given temporal model. However, for multimedia, they should be distinguished because a videoaudio presentation that is just overlapping does not satisfy the user. The reason for this is that multimedia data should not be presented ahead of time. In parallel computing, data are processed as soon as possible. In contrast, video data that are available ahead of time should not be presented before the corresponding audio data are ready. To guarantee that multimedia data are processed just on time [Stei92], a fine-grained model of temporal relations including various types of parallelism is necessary. A second requirement addresses the flexibility of temporal models which is needed when not all events are preknown.Typically, when specifying a multimedia presentation, not all events are known before the presentation is started. Asynchronous events caused by the system or by user interaction often result in a rescheduling of presentation items. E.g. a student might pause a video lecture, look up a definition in a data base and then take a note for his term paper. All the actions are highly indeterministic and cannot be predicted by the supporting multimedia system. Therefore, many temporal relations are unspecified or only partially restricted. To express unspecified or partial relations, a multimedia time model has to be very flexible. Temporal models with totally ordered events generally do not satisfy this criteria. Finally, the multimedia user needs intuitive abstractions of temporal relations to ease authoring of multimedia presentations. Therefore, high-level temporal relations are needed [BuZe93]. E.g. for a synchronous presentation of a video v and its audio a, we would like to specify `v synchronous a' instead of specifying all the details, such as `a and v start together, are displayed with the same constant speed and end at the same time'. Since earlier temporal models do not meet the specific requirements of multimedia, several models have been proposed, and it has been discussed which of the models is appropriate for multimedia. However, this question cannot be answered in general because simple multimedia environments need weak temporal models whereas sophisticated systems require more complex models. To find the appropriate temporal model, we would like to know the expressive power of the existing temporal models. Before assessing the temporal models, it is very helpful to understand the two basic temporal frameworks given in Section 2 and the temporal characteristics of multimedia presentations described in Section 3. Then, we describe and classify and evaluate the most important existing temporal models in Section 4. It will turn out that those models are very limited in their expressive power. Therefore, we introduce a new powerful temporal model based on interval operators in Section 5. 2. Basic Temporal Frameworks 3 2. Basic Temporal Frameworks Before examining multimedia time models, a basic understanding of the fundamental temporal frameworks is necessary. Depending on their elementary units, two basic classes of time models can be distinguished [vBee92]. In the first class, time is expressed by means of points in a onedimensional time space [ViKa86] whereas, in a second model class, intervals are the atomic units of the time space [Alle83]. This section introduces the basic models, their elementary units and the relations between them. 2.1 Point-Based Framework In point-based temporal models, the elementary units are events, which are points in a time space. Given two events in history, only three relations can hold between them. An event can be before (<), simultaneous to (=) or after (>) a second event. The relations <, =, > are called the basic point relations (basic PRs). In contrast to relations in the past, relations between future events might be indefinite. For example, we know that an event e1 cannot occur after an event e2. This means that e1 is before or simultaneous to e2. This is denoted as e1 < e2 ? e1 = e2 or as e1 {<,=} e2. Note that e1 is before or simultaneous to e2, and it is not known which of the relations will become true. Typically, indefinite relations are represented as disjunctions of basic PRs. Since there are 3 basic PRs, 23 = 8 disjunctions exist each representing an indefinite relation. Any of the 8 indefinite relations has an associated symbolic notation. For example, instead of e1 {<,=} e2, we use e1 ? e2. The 8 indefinite relations are: ?, ?, <, =, >, ?, ?, ?, where `?' is the full set of basic PRs {<, =, >}, ? is the empty set {} and the others are self-explaining. In this paper, we identify the basic relations <, =, > with the indefinite relations {<}, {=} and {>}. Therefore, the basic PRs are a subset of the indefinite PRs. 2.2 Interval-Based Framework Intervals are the basic units of a time model class suggested by [Alle83], [Bruc72]. There are 13 basic interval relations (basic IRs). Table 1 summarizes the basic interval relations showing the name, the symbol, the inverse and an example for each relation. In this context, x and y represent intervals. Also, a point notation exists for each IR. It is given in the fourth column with Bx denoting the beginning and Ex the end of the interval x. relation symbol inverse conjunctions of point-relations example class x before y < > Bx}. We also identify the basic IRs with their corresponding indefinite IRs such that the basic IRs are a subset of the indefinite IRs. Table 2 compares the two frameworks in terms of the number of possible relations. 2.3 Translations between Representations As we will show in Section 4, some temporal models proposed for multimedia are point-based, others are interval-based or hybrid. To compare temporal models of different frameworks, we need to translate temporal specifications from one framework to the other. Doing this, we can benefit from essential results proved in temporal logics. This section presents some important statements from temporal theory [Rich89]. Generally, temporal intervals describe the duration of a media item in a presentation environment. So, we use the relations that a temporal model can represent between two intervals to evaluate its expressive power. In a point-based framework, some relations between two intervals are represented as conjunctions of PRs between the four end-points of the two intervals. Four relations between the four end-points of the a pair of intervals can be specified (Figure 1). By labelling the end-point relations with basic or indefinite PRs, we can find out how many IRs are representable in a point-based framework. Table 2 shows the number of consistent IRs that can be expressed by conjunctions of the given PR set. E.g., conjunctions of the basic PRs set <, = and > just create the basic interval relations. If the larger PR base <, =, >, ? is used, 29 consistent interval relations can be represented. Although the basic PRs generate all basic IRs, an equivalent statement for indefinite relations does not hold. The full set of indefinite PRs gener- y finishes x f fi Bx and ? are especially important in multimedia environments. 3. Characteristics of Multimedia Time Models Some temporal characteristics observed in multimedia systems are inherent to processing media items. Taking into account the temporal behavior of the media items, specific temporal models tailored to multimedia applications can be defined avoiding complex universal models. However, before adjusting temporal models to multimedia, it is helpful to know which are the relevant relations in multimedia applications. A point-based model should obtain a representation form for all PRs that have to be specified when composing a multimedia presentation. So, it is interesting to know which PRs do occur in multimedia. Obviously, the basic PRs <, =, > occur because presentation events might be before, simultaneous to or after other events. To evaluate the indefinite PRs, we have to consider the fact that small inaccuracies are tolerated in multimedia. E.g. in a video-audio presentation, the audience does not notice the skew introduced if the audio is presented too early or too late by some milliseconds [Stei92], [LiKo92], [RoDe92], [BDF+92]. So, we do not need to specify the temporal behavior at exactly one point in time rather it is sufficient to specify the temporal behavior close to each point in time. This implies that there is not any perceptible difference in the presentation if somebody specifies for two events e1 and e2 that e1 < e2 or in the second case e1 ? e2. This holds because the audience cannot distinguish whether e1 is simultaneous to e2 or e1 is 1 millisecond before e2. Therefore, it is sufficient to be able to express only one of the relations < or ?. In this paper, we operate with the relations < and >, and do not need the relations ? and ?. Analogically, the relation ? point relation base number of consistent disjunctive IRs <, =, > 13 <, =, >, ? 29 ?, <, =, >, ?, ? 82 ?, <, =, >, ?, ?, ? 187 Table 3: Disjunctive IRs generated by point relations Figure 1: Computing the number of consistent IRs ?? ?? ?? ?? 3. Characteristics of Multimedia Time Models 6 differs from the ?-relation only in one point in time. Since there is not perceptible difference between the two relations, we do not need the relation ? if we have the relation ?. Observe that we need the relation ? if any basic PR can hold between two events. This indefinite often occurs during the specification and planning process when not all events are known yet. Generally, the ?-relation is responsible for the flexibility of a temporal model because it includes all possible basic PRs. To summarize, the relations <, = , > and ? are the most important relations in multimedia environments. Powerful point-based temporal models should be able to express at least this set of relations. According to Section 2.3, the PRs <, =, > and ? generate the 29 interval relations. In Annex A, Table 5 enumerates the 29 IRs, and gives a point, an interval and an operator representation for each IR. The operator representation will be explained in Section 5. A commonly applied temporal model is the time-line by which only the 13 basic IRs are representable. Some authors [HyTi92], [Hoep91], [LiGh90] assure that their temporal models are at least as powerful as the time-line by showing that the 13 basic IRs are expressible within the model. However, it was omitted to determine the power of temporal model, i.e. to show how many and which types of relations can be represented in the model. Temporal specifications that are restricted to the 13 basic IRs are often over-constraint. Indefinite IRs are needed to avoid this problem. It is observed that indefinite IRs occur frequently in multimedia systems. For example, if we do not care about the end of the presentation components x and y, we issue a `cobegin' for x and y. The result might be that x ends before, after, simultaneously to y. Note that this cannot be expressed by a single IR because then the relation between the end-points of x and y would be known. We conclude that multimedia needs indefinite IRs. As it was shown in the previous section, some indefinite IRs cannot be represented as conjunctions of PRs. This fact is a major handicap of point-based systems because disjunctions of conjunctions of PRs cannot be represented by most point-based systems. One of these indefinite IRs is the `mutual exclusion' which is needed when limited resources are shared. For example, if there is only one loudspeaker, then two audio sequences should not be presented simultaneously (Figure 2). Therefore, we would like to specify that the audio sequences are not parallel. This is expressed by the indefinite IR {<, m, mi, >}. Represented by PRs, a disjunction is needed: Ex ? By ? Ey ? Bx. Consequently, `mutual exclusion' cannot be represented in point-based systems that do not allow disjunctions. Figure 2: Mutual exclusion audio1 audio2 audio1 audio2 audio2 audio1 time scenario 1: ok scenario 2: not admitted scenario 2: ok 4. Evaluation of Multimedia Time Models 7 4. Evaluation of Multimedia Time Models In the context of multimedia, various temporal models have been proposed by many authors. The temporal models are hard to compare because they are based on fundamentally different approaches of time modelling. This section analyzes the expressive power of the most important temporal models. Specifically, the number of indefinite IRs that can be represented in this model are determined and a classification is given whether a model is mainly point- or interval-based. The latter question is not always easy to answer because some temporal models use intervals as their basic units but their relations address at most one end-point of each interval. Essentially, those models have the same characteristics as point-based approaches. 4.1 Time-Line The time line model is applied by [BHL91], [Gibb91], [Appl91], [Drap93] and in HyTime [HyTi92]. In the time line model, all events are aligned on a time axis (time line) as it is shown in Figure 3. Since events are the atomic units, the time line model is point-based. All events are totally ordered on a time line. So, exactly one of the PRs <, =, > holds between any pair of events. As all events are totally ordered, it is impossible not to define a relation between any two events. This means that the relation `?' cannot be expressed in the time-line model. This lack of flexibility is a major disadvantage of the time-line model. With <, = and > being the only possible PRs in the time-line model, we can conclude that the 13 basic IRs are the only IRs that are expressible in the time-line model. 4.2 Temporal Point Nets [BuZe92], [BuZe93] use a point net to represent time specifications (Figure 4). Relations address events establishing temporal equalities (=) and temporal inequalities (<, >). Although [BuZe92] does not mention it, a fourth relation (?) can be specified meaning: The relation between two time points is not restricted. The ?-relation adds a flexibility to the model that cannot be found in the time-line model. Using the PRs <, =, > and ?, 29 IRs can be represented including the 13 basic IRs. [BuZe92] also defines a relation construct `before by at least d' where d is a delay parameter describing the temporal distance between two events. For d=0, the point relations ? and ? can be specified. Then, the PRs <, ?, =, ?, >, ? are representable in the point net model generating 82 IRs. Figure 3: Time line model time video audio animation text 4. Evaluation of Multimedia Time Models 8 4.3 Timed Petri-Nets A timed petri net model is proposed by [LiGh90] and [Hoep91]. The petri net of [Hoep91] is a mapping of the path notation on petri nets and will be analyzed together with the path notation in Section 4.4. In this section, we essentially follow the petri net definition of [LiGh90]. There, intervals are represented by places and relations by transitions. In order to avoid ambiguities, we need the additional assumption that petri nets in this context are conflict-free. The basic units of the model are intervals. Therefore, this model is classified as interval-based although transitions refer only to end-points of intervals. Figure 4: Temporal point net before simultaneous before video audio animation text simultaneous Figure 5: Petri nets d d ?: ?, <, =, >, ?: =: begin-begin end-end end-begin begin-end 4. Evaluation of Multimedia Time Models 9 The relation `?' is specified if two places are not connected by any transition. As shown in Figure 5, <, =, > can be modelled by a transition in conjunction with a delay place d. The delay place represents an idle time d ? ?+0. If d is in ?+, the corresponding relations are < and >. The relation = is modelled if d = {0}. In this case, the place can be omitted as it is done in Figure 5. If d is unrestricted in ?+0, then ? or ? is expressed. In petri nets, the PRs ?, <, =, >, ? can be represented. Since Figure 5 assures that any combination of interval end-points can be connected by a relation, the petri net model is as powerful as the point net model. This means that 82 IRs can be expressed although [LiGh90] described only the 13 basic IRs. 4.4 Path expressions Path expressions were introduced by [CaHa74] for procedure level synchronization and adapted by [Hoep91] for multimedia presentation systems. Path expressions include three operators to represent temporal relations: sequence, parallel-first and parallel-last. The basic units of path expressions are intervals. However, all three express only IRs that can be described by a single PR. The sequence operator models a relation between the end-point of the first and the beginning of the second interval. The IRs that can be expressed by the sequence operator are {m} and {mi}. Using a delay interval [Hoep91], it is also possible to represent {<} and {>}. For this classification, the operators parallel-first and parallel-last are identical because the attributes first and last give reference points for subsequent operators, which do not have any impact on our relation analysis. The parallel operators establish a relation between the start-points of two intervals. Three indefinite IRs are expressible by the parallel operators: {s, =, si}, {di, o, fi, m, <} and {>, mi, oi, f, d}. To summarize, path expressions are only able to represent 7 IRs: 4 basic IRs {m}, {mi}, {<}, {>} and 3 non-basic indefinite IRs {s, =, si}, {di, o, fi, m, <}, {>, mi, oi, f, d}. 4.5 MHEG MHEG (Multimedia Hypermedia Expert Group) [MHEG92] [KrCa92] [Mark91] is a standardization group to establish a new standard for multimedia objects. MHEG uses a time model sim- Figure 6: Path expressions * * sequence parallel-first parallel-last 4. Evaluation of Multimedia Time Models 10 ilarly to the path expression model. Additionally, MHEG allows not to specify any temporal relation between two intervals represented as multimedia objects. Therefore, MHEG has 8 possible IRs, one more than the original path expression model. 4.6 Resume of the Evaluation Table 4 summarizes the multimedia time models, their basic types and the corresponding IRs that can be represented. Assessing the temporal models, it is not only important how many relations are expressible in a specific model but also which relations are representable. As we showed in Section 3, not all relations are equally important. Specifically, the 29 IRs generated by the PRs <, =, > and ? are very important including the 13 basic IRs. So, Table 4 also shows how many basic and how many of the 29 relevant IRs can be expressed in each of the temporal models. It can be observed that non of the examined temporal models exceeds the expressive power of the point-based framework, not even those models that operate on intervals. All temporal relations in the examined models can be denoted within the PR system ?, <, =, >, ?, ?. To provide the full expressive power of the interval-based framework for multimedia, we will develop an interval operator system in the next section. The more relations a temporal model is able to represent, the more general it is and less prerequisites have to be met when it is applied. However in some multimedia environments, only a limited number of relations can occur. Then, only a simple temporal model is needed. So, when choosing a temporal model for multimedia, the context and the restrictions have to be respected. time model type number of interval relations total basic representable by the PRs <, =, >, ? time-line point-based 13 13 13 point nets point-based 82 13 29 petri nets interval-based 82 13 29 path expressions interval-based 7 4 7 MHEG interval-based 8 4 8 Table 4: Summary: Multimedia time models Figure 7: MHEG time model sequence parallel 5. An Interval-Based Time Representation 11 It seems that there is not a universal temporal model for all multimedia applications. There are simple models for simple environments and more universal models for complex environments. The question of a most suited temporal model for multimedia became especially important since a standardized temporal model is needed for exchanging and storing multimedia information. With the emerging standards HyTime and MHEG, it has been discussed which of their temporal models is more general. Concluding from our analysis, MHEG models less relations but has more flexibility due to the ?-relation, whereas HyTime using the time-line model has more possibilities. But neither MHEG nor HyTime is a superset of the other. The time models of HyTime and MHEG do not compare. 5. An Interval-Based Time Representation Since all real presentation actions (video, audio, text, etc.) have a non-zero, finite duration, it seems to be natural to model multimedia actions as intervals. Also, point-based systems have some inherent disadvantages that are due to the limitations of the point-based time model. To overcome the disadvantages of point-based systems, we will systematically develop an intervalbased model in this section. 5.1 Modelling Presentation Actions Before developing this model, we have to introduce the notion of a presentation action. Any multimedia presentation is composed of single media items. The process of presenting a single media item is called a presentation action. Any action can be characterized by two significant end-points, the beginning and the ending, and the duration d which describes the time is required when presenting a media item. The duration d has a specific fixed value for any real presentation. However in the process of planning a presentation, the final duration might not be known. Therefore, the duration is described as a subset of the non-negative numbers ?+0 [KeLo91] indicating the potential values of the duration. So, the duration can be a single real number, a range within the real numbers or totally unrestricted in ?+0. E.g. the duration of a 90-minute video that might by interrupted by a user interaction is written as [0 min, 90 min] ? ?+0 because the real duration is 90 minutes or less depending on the user interaction. In the other case, the duration is denoted as [90 min, 90 min] = {90 min} ? ?+0 meaning the duration cannot be modified and has a fixed value. A delay is a time span which passes without presenting any audio-visual output, and thus it is distinct from a presentation action with a perceivable output. On the other hand, the temporal characteristics are similar to those of presentation actions. So, a delay can be described as a subset of the non-negative real numbers ?+0. Note that, in this paper, it suffices to characterize a presentation action only by its temporal behavior. Other attributes including those specifying the location, the quality or associated media of a presentation are not subject of our investigation. 5.2 Primitive Interval-Based Models For specifying temporal interrelations between media actions, two extreme approaches can be considered. In the first approach, disjunctions of the 13 basic IRs are used as a method to specify interval relations. E.g., a `cobegin' of the presentation action can be denoted as a disjunction of 5. An Interval-Based Time Representation 12 `starts', `equals' and `starts inverse' {s, =, si}. The obvious drawback of the approach is the high number of IRs required to represent a single PR: Up to 11 IRs are needed to represent a single PR [Rich89] (Table 5). Of course, this is not acceptable as a user interface because users need single and intuitive relations. Consequently, we require that at least the 29 IRs relevant to multimedia and generated by the PRs <, = and > should be represented by a single relation operator. The other extreme is a model based on a totally generic operator. The operator can be derived from Figure 9 as: genericIR(d1, d2, d3, d4), where di, i ?{1,..,4}, is the delay for each of the possible end-point relations. The delay can be any subset of the real numbers. In this model, the delay may have negative values to indicate which of the corresponding time points is the first one. The trade-off of this approach is the huge number of inconsistent specifications that can be created by this operator. Moreover, consistency checking would be as expensive as in a pointbased temporal model. 5.3 Enhanced Interval-Based Model Though very flexible, both of the above models are not applicable as they do not represent temporal relations intuitively. Therefore, we define an alternative model by using the IRs generated by <, =, > and ? (Table 5). Constructing an operator for each of the relations, 29 operators are needed. This number seems to confuse the user of a presentation system. Fortunately, the number of operators can be reduced by exploiting regularities between the IRs. Then, several IRs can be combined to one operator. Using the regularities, the number of operators can be reduced from the original 29 to 10. Figure 10 shows the generic pattern for each of the operators. Formal definitions can be derived from the patterns. For example, the operator x before(d1) y is defined by Ex + d1 = By, i.e. the beginning of the interval y is d1 time units after the end of the interval x. The first regularity is that some relations are inverse to each other. E.g., `x meets y' is the inverse of `y meets x'. So, we can use the operator before(d1) to specify both relations: x before(0) y for `x meets y' and x before-1(0) y for `y meets x'. In graphical notations, the inverse is expressed by an inverted edge. Figure 8: Expressing `cobegin' by disjunction of basic IRs s, =, si Figure 9: Totally generic operator: genericIR(d1, d2, d3, d4) d1 d3 d2 d4 5. An Interval-Based Time Representation 13 The second regularity is that some relations differ only by an offset from others. E.g., `x meets y' and `x < y' are only in so far distinct as there is a non-zero time span between x and y in the case of `x < y' and a zero time span in the case of `x meets y'. IRs that differ only in offsets are combined to the same operator. Then, the IRs can be distinguished by the delay parameter d1 of the operators. In the given example, we specify x before(0) y for `x meets y' and x before(+) y for `x < y'. As we introduced in 5.1, the delay parameter may be any subset of Figure 10: Basic IR patterns and their generic operators d1 d3 d2 d1 d1 d2 d1 d2 before(d1) while(d1,d2) overlaps(d1,d2,d3), di ? {0} cross(d1,d2), di ? {0} d1 d1 d1 cobegin(d1) coend(d1) beforeendof(d1), di ? {0} d1 d2 delayed(d1,d2), di ? {0} d1 d2 startin(d1,d2), di ? {0} d1 d2 endin(d1,d2), di ? {0} 5. An Interval-Based Time Representation 14 ?+0. We use the notation `0' if the delay is zero, `+' if the delay has a positive value, and `*' if the delay is positive or zero. To avoid having several specification methods for the same IR, we require d1 ? {0} for some of operators in Figure 10. Then, the 10 operators are a complete set to specify any of the 29 IRs generated by <, =, > and ?. An interval operator specification for each of the 29 IRs is given in Table 5. The construction of the interval operators yields different types of operators taking 1, 2 or 3 delay parameters. The 1-parameter operators are before, cobegin, beforeendof and coend. Operators with 2 parameters are while, delayed, startin, endin and cross. Finally, overlaps is an operator that takes 3 parameters. A delay or a duration parameter is fixed if only one value is admitted, e.g. a full length video that cannot be interrupted has a fixed duration of 90 min = [90 min, 90 min]. When specifying an interval relation, one has to specify the duration of the two presentation actions and up to 3 delay parameters. Hence, specifying 3 fixed values for the delay or the duration totally determines the final presentation sequence. Therefore, at most 3 fixed delay or duration parameters are allowed to avoid overconstraint specifications. E.g., if we specify the interval relation for two fixed length presentation actions, we can only use an interval operator taking 1 parameter. In the case of one fixed length action, we can use only 1- or 2-parameter operators. Only in the case that both actions have a variable length, we are allowed to use all operators. To elaborate the restriction, look at the following example. A user specifies fading from one video to a subsequent video. If we have 2 videos and want to display the full natural length of both videos which have a specific fixed duration, we might specify beforeendof(d1) where d1 describes the time span during Figure 11: Specifying fading d1 = 4 min d3 = 3 min d2 = 1 min d1 = 8 min d2 = 1 min overlaps(d1,d2,d3) cross(d1,d2) d1 = 1 min beforeendof(d1) 5 min 5 min 4 min [0, 60 min] [0, 60 min] [0, 60 min] 5. An Interval-Based Time Representation 15 which both videos are displayed. The presentation planner will find a consistent scenario in any case. However, the videos do not overlap if the duration of one of the videos is shorter than the time span d1. If the length of the 2 videos are variable, e.g. we need only parts of the videos for composing a video clip sequence, we might use the overlap(d1, d2, d3) operator to specify fading. Then d1 represents the time during which the first video is displayed but not the second, d2 is the time during which the both videos are active and d3 describes the postspan of the second video. In case, one video has a fixed duration and the other is variable, the cross(d1, d2) operator is used. d1 indicates the total duration of the presentation and d2 the overlapping time. The 10 interval operators are a complete set to represent the 29 relations generated by <, =, > and ?. But this does not imply that all operators are needed to define a complete temporal model for a multimedia environment. Sometimes, only a selection of the operators is necessary. E.g., if the duration of all media items is preknown and fixed, the temporal model may be restricted to the operators taking at most 1 delay parameter. Note that the requirements `preknown and fixed duration' are very strict and prohibit any kind of interaction or flexibility. With the emerging interactive multimedia systems, it is expected that a larger subset of the interval operators is needed because interactive media items introduce a huge number of unpredictable durations. 5.4 Expressing `Mutual Exclusion' Using disjunctions of interval operators, all 213-1 satisfiable indefinite IRs can be generated. For example to specify that to multimedia actions should be not presented in parallel, we specify `before(+), before-1(+)' meaning either `x is before y' or `y is before x'. To specify this case, a disjunction is necessary. Since disjunctions cannot be specified in point-based systems, this case cannot be implemented by these systems. Using the interval operators, the disjunction can be represented in a graph (Figure 12). In point-based models, the graphical notation of this problem is not equally transparent or not possible at all. 5.5 Examples We will look at two multimedia presentation scenes to show the differences between the timeline, the point relation net and the interval operators. The first scene starts with a simultaneous presentation of a slide and some background music. Then, the user can terminate the slide interactively and continue with the next slide. Also, the user might stop the background music any time. Using interval operators, this scene is specified easily (Figure 13). In the time-line model, this scene is not representable because the end-points of the slides and the music is determined interactively. This means that the end-points are not known ahead of Figure 12: Expressing `not parallel' before(?), before-1(?) 5. An Interval-Based Time Representation 16 time. However, we need the end-point of the previous slide to specify the beginning of the next slide. We would have to pick a point on the time-line although we do not know when this point in time will be. This specification problem of the time-line model is caused by its lack of flexibility, i.e. the time-line requires a total specification of all temporal relations between media items not admitting any indeterminism. Consequently, the time-line model is not appropriate for partial specifications or interactive media environments. The second scene is a video clip sequence. A short video-audio clip is followed by a subsequent video-audio clip, and the transition between the video-audios is done by fading. Moreover, not more than two videos should be active at the same time, e.g. the system does not allow fading between three videos at the same time. The specification of this complex scene is done quickly and fairly intuitively by interval operators (Figure 13). Using the point net representation, this scene becomes quite complex because we need a huge number of point relations. Additionally, this scene is hard to represent in a graphical notation. Point nets use only very basic relations resulting in a huge number of relations that have to be specified in complex scenarios. Interval operators have the advantage that they provide richer relations which allow the specification of complex presentations with a few powerful statements. Interval operators are more similar to natural languages which also use rich temporal relations such as `while', `during' and `overlapping'. For complex scenarios, the interval operators are more appropriate because, first, the operators are represent high-level temporal relations, and secondly, the interval-based framework is more powerful. Figure 13: Slide show scenario music slide1 time slide1 before(0) interval operators time line ? ? ? before(0) before(0) before(0) music slide3 slide2 slide4 Figure 14: Video clip scenario interval operators point relation net video1 audio1 while before overlaps audio n video n video1 audio1 before audio n video n 6. Discussion 17 6. Discussion In multimedia systems, synchronization is an important issue composed of the subtasks of representing temporal information and satisfying temporal constraints during the execution. This paper examined the representation of time for multimedia. After introducing the two temporal frameworks, point-based and interval-based, we showed that the point relations <, =, > and ? are needed in a multimedia environment. Analogically, the important relations in interval-based frameworks are the 29 IRs generated by the four PRs <, =, > and ?. Then, we determined the expressive power of existing approaches of time modelling in multimedia. It is observed that none of these models exceeds the relations set that is expressible in a point-based framework. Learning from the shortcomings of the existing approaches, a set of interval-based operators were developed. Obviously, the interval operators represent high-level expressions of temporal relations. Since they were derived from the relevant 29 IRs, the interval operators cover the most essential set of interval relations. The proposed set can also be constructed from the PR set ?, <, =, >, ?, ?. Then, 82 IRs are representable by a single interval operator. Further, the interval operators are able to represent all 213-1 indefinite satisfiable IRs as disjunctions of operators as it was shown for the mutual exclusion in Figure 12. The expressive power of the interval operators cover the entire interval relation space which includes the expressive power of the pointbased framework. Therefore, a huge number of temporal relations are representable by interval operators, i.e. the interval operators provide a fine-grained model of temporal relations. Moreover, the interval operators guarantee a high-level of flexibility because they were developed respecting the ?-relation which is responsible for the degree of flexibility. Finally, the intervalbased framework reduces the number of possible inconsistencies. Looking at Figure 1, there are 34 = 81 possibilities to specify a relation between two intervals using the PRs <, =, > and ?. But it is proved that only 29 of those represent a consistent scenario. The interval operators were developed such that the 62 inconsistent scenarios representable in point-based models cannot be specified by the interval operators. So, the interval-based operators significantly simplify consistency checking. This is important because extensive consistency checking may substantially affect the performance of a multimedia system. This is crucial as those systems are subject to real-time constraints. Since the interval operators provide a high level of flexibility, modelling interaction can be added easily. Studies of integrating an enhanced interaction model are in progress. Annex A Table 5 summarizes the 29 interval relations that are generated by the point relations <, =, > and ?. Each interval relation is represented as a conjunction of point relation (first column) or as a disjunction of basic interval relations (second column) or as an interval operator (last column). 6. Discussion 18 point notation interval notation operator notation >, di, oi, mi, si, fi, =, f, s, o, d, m, < BxBy >, oi, mi, f, d cobegin-1(+) BxEy > before-1(+) ExBy >, di, oi, mi, si, fi, =, f, s, o, d beforeendof-1(+) ExEy >, di, oi, mi, si coend-1(+) BxBy oi, f, d startin-1(+,+) Ex>By BxBy BxBy s, o, d endin-1(+,+) ExBy d while(+,+) ExBy f while(+,0) Ex=Ey Bx=By = while(0,0) Ex=Ey BxEy BxEy Bx>By >, oi, mi delayed(+,+) Ex>Ey Bx=By si while-1(0,+) Ex>Ey BxEy BxBy oi overlaps-1(+,+,+) ExBy Bx