Modelling Interaction with HyTime Stefan Wirag, Kurt Rothermel, Thomas Wahl University of Stuttgart IPVR Breitwiesenstr. 20 - 22 D-70565 Stuttgart wirag@informatik.uni-stuttgart.de Abstract Interactive multimedia presentations are an essential issue in many advanced multimedia application tools. Before presenting multimedia data, media items, interaction types and synchronization constraints have to be specified in a multimedia document. This paper identifies and classifies the temporal interaction types in multimedia systems, and shows their impact on the specification process and the supporting system. Then, we describe how to specify the interaction types by using the standardized multimedia document language HyTime. The HyTime mechanisms are demonstrated by examples followed by a discussion of the advantages and limits of each technique. 1 Introduction With the emerging multimedia technologies, more and more application tools are developed to process and present multimedia data. Generally, multimedia data are stored as multimedia or hypermedia documents using a proprietary format [Appl91], [BuZe93], [BHL91], [LiGh90]. The result is that tools of different vendors or developer groups cannot exchange their documents without a format conversion. Converting document formats is expensive if possible at all. A general document standard for multimedia would alleviate this problem. HyTime [HyTi92] defines a standardized language for specifying the essentials of multimedia documents, such as addressing documents or defining temporal constraints. When developing our multimedia presentation system TIEMPO1 [WaRo94], we considered HyTime as a document model. The architecture of HyTime is described by [Gold91] and [NKN91]. Further, [Erfl94] 1TIEMPO: grant of the Deutsche Forschungsgemeinschaft DFG Temporal integrated model to present multimedia-objects 2 Interaction 1 examined how to specify synchronization constraints in HyTime but he excludes the question how to specify interaction. As multimedia application tools increasingly support interaction, this question becomes a crucial issue in multimedia documents. Specifically, those interaction types that affect the synchronization constraints are critical because the predefined presentation schedules specified in HyTime might be modified in case of an interaction. Although HyTime does not provide any direct mechanisms to express interaction, there are generic mechanisms that can be used for modelling interaction since HyTime is an encompassing standard. Therefore, this paper examines the mechanisms of HyTime, shows how interaction might be expressed and describes specification techniques that exceed the approaches of [KRRK93] and [BRR94]. In section 2, we identify the interaction types affecting the HyTime schedules, and describe the issues for the specification and the system support of the interaction types. Section 3 introduces the essentials of HyTime. HyTime mechanisms to support interaction are presented and discussed in section 4 followed by section 5 describing how to model the interaction types in HyTime. Finally, we summarize the results. 2 Interaction Various methods exist to interact with a system during a multimedia presentation. A user might resize a presentation window or control the volume of an audio channel. Some of the interaction types affect the temporal layout of a multimedia presentation. E.g. pushing the pause-button delays future events, or with the fast-forward-button future events occur earlier than originally scheduled. Interaction types that affect the temporal layout of a presentation are critical because most multimedia presentations include a schedule of all events within the presentation. Thus, interaction might result in several changes of the presentation schedule because events have to be rescheduled, new events are added or other events are no longer valid and have to be removed from the schedule. In this section, we identify the interaction types with a temporal impact. Then, the issues of specifying and executing multimedia presentations are discussed in respect to the interaction types. 2 Interaction 2 2.1 Interaction types Table 1 summarizes the interaction types with a temporal impact on multimedia presentations. The interaction types are described by their name, symbol and their impact on the presentation speed. The symbol represents the default presentation trace by a grey arrow and the modification of the trace by a black arrow. Multimedia applications can be classified according to the interaction types that are offered when presenting documents. The fourth column of table 1 indicates which interaction type is included in which class. Hence, the lower class interaction types might be included in the higher classes. A first class of multimedia presentation systems does not provide any interaction during the presentation. But still a start-mechanism is needed to produce any perceivable output. Therefore, start is an essential interaction type of any presentation system. In a second class, the presentation speed can be varied by interaction types such as faster, slower pause, continue or stop. However, the direction of the traversal through the multimedia document remains forward during the entire presentation. For this reason, this class of systems is called linear directed. More advanced presentation systems also allow to reverse the presentation direction or to jump to another part within the document. So, the sequence of the presented events is no longer predefined. But still the default presentation of a document is linear, i.e. all events are totally ordered. Therefore, this class is called linear undirected. The most comprehensive class of sys- interaction type symbol presentation speed class start vnew = vdefault basic stop not defined linear directed pause vnew = 0 linear directed continue vnew = vbefore_pause linear directed faster vnew = sign(vbefore)* |vbefore| ++ linear directed slower vnew = sign(vbefore)*|vbefore| -- linear directed reverse vnew = -vbefore linear undirected jump vnew = vbefore linear undirected selection vnew = vdefault non-linear Table 1: Classification of interaction types 2 Interaction 3 tems additionally provides the selection-interaction, by which the next media item can be chosen from a list of items. The path through a multimedia document with selection-interaction is no longer predefined. A variety of paths are possible. So, the selection is a non-linear interaction type. 2.2 Specification of interactive documents and system support The basic class of presentations without any interaction except the start-command can be specified by using real time synchronization constraints since all events except the start are preknown and predictable. Also once the presentation is started, the supporting system can meet all synchronization constraints by prefetching all necessary presentation data. The non-basic classes of presentations are specifiable by real time constraints because the duration of a presentation segment might vary due to interaction such as pause slower, faster, etc. Therefore, the concept of virtual [HyTi92] or logical time [Lamp78], [AnHo91], [RoHe94] was introduced. The presentation data is considered as a totally ordered sequence of information units. Then, the logical time is defined by the sequence of information units. A logical time unit can be given in frames, samples, bits, bytes or simply an abstract unit. Now, synchronization constraints are specified in terms of logical time units. However before rendering such a document, its logical time has to be mapped to real time. Figure 1 shows how the mapping is done Figure 1: Mapping logical time to real time real time start document real time start document real time start document pause continue reverse reverse, jump basic interaction linear directed interaction linear undirected interaction 3 HyTime 4 in the basic, the linear directed and the linear undirected class. Each point in real time is assigned the multimedia data that is rendered at that time. Executing a presentation of the linear directed class can be implemented fairly easily as the supporting system knows at any time what presentation data might be rendered next. This holds because the rendering direction of a linear directed document is always forward. Implementing the linear undirected class is more sophisticated because several presentation data units might be rendered next depending on the interaction events that occur. In case of the reverse interaction, the system might present either the last data unit or the next data unit depending on whether the reverse-button was pressed or not. In case of a jump, the number of possible data units to be rendered next is theoretically infinite. The most complex class, a non-linear presentation, cannot be specified on a single logical time line as it is not known which selection will be chosen by a user. E.g. figure 2 shows a scenario in which a first talk is followed by a selected video and then by a second talk. Depending on the duration of the selected video, the sequence numbers of the logical units of the second talk are different. So, there is not a unique document time. Then, synchronization constraints cannot be aligned on a single logical time axis. Implementing selection, prefetching of presentation data by the supporting system is not trivial because several data units are in question to be presented next depending on the number of options offered by the selection. This can vary from a few to a theoretically infinite number of choices. 3 HyTime Any platform for hypermedia applications might have its own proprietary method of representing documents. Thus, it is difficult or even impossible to interchange documents created by dif- Figure 2: Non-linear interaction talk1 talk2 video1 video2 video3 1 7 8 8 8 12 15 17 13 16 18 20 23 25 logical time unit 3 HyTime 5 ferent applications. Therefore, HyTime (Hypermedia/Time-based Structuring Language) was developed as an international standard [HyTi92] for structured representation of hypermedia documents for integrated open hypermedia applications. It is an SGML (Standardized Generalized Markup Language) application and is interchanged using ASN.1 for OSI-compatibility. A hypermedia document is a set of documents and other information objects connected by links. When the definition of a document type is created, content and rendering instructions are distinguished. HyTime standardizes those facilities dealing with the addressing of portions of hypermedia documents and their component multimedia information objects including the linking, alignment and synchronization of document items. HyTime does not standardize the data content notation, the encoding of the information objects or the application processing them. The HyTime standard does not impose any particular implementation architecture, and it is possible to integrate HyTime-processing in application programs if desired. The HyTime architecture is modular and only the required facilities need to be implemented. The HyTime standard consists of the following modules: The base module specifies the basic issues. The location address module specifies the addressing facilities. The hyperlink module specifies the hyperlink facilities and the finite coordinate space module deals with the position of objects in space and time and their modification. Figure 3 shows the relations of the modules. A brief description of the HyTime features that are useful for the specification of interaction is given in the following sections. Figure 3: HyTime modules [NKN91] base module location address module hyperlink module finite coordinate space module event projection module object modification module 3 HyTime 6 3.1 Document structure The architecture of an SGML document is expressed in its Document Type Definition (DTD). The syntax is expressed as a set of elements, each with its own generic identifier, a set of attributes and the content model, which determines the data types to be used in the element. The HyTime standard defines element types called architectural forms (AF) identifiable by the attribute HyTime. By including the attribute HyTime and conforming to the model of a particular HyTime architectural form, document authors can create derived element types with specific semantics. Additionally, attributes can be inserted containing information according to the semantics. Using AF's and derived element types, document authors can create DTD's which incorporate only those semantics of HyTime that are needed. 3.2 Control flow HyTime documents are interpreted by a HyTime-engine. If a HyTime document is ready to be processed, the application calls the HyTime-engine which in turn calls the SGML-parser. The parser notifies the HyTime-engine about anything important. The HyTime-engine performs address resolution, linking, alignment and synchronization and passes the entire output of the document back to the application controlling the presentation. The flow through a hyperdocument is controlled by an application program and can be modified by scripts that are embedded within a hyperdocument. The application calls programs which interpret the scripts. 3.3 Temporal relations In HyTime, an object is a piece of information of any type. An object may consist of data such as video, audio, graphical objects or text. To position objects in space and time, HyTime uses a finite coordinate space (FCS). A FCS is described by axes. Any FCS establishes a specific measurement domain with a reference unit defined for each axis. Objects in a HyTime finite coordinate space occur as the content of events. An event is a conceptual frame for an object. Each event has a dimension specification that represents its position and extent on the coordinate axes of the FCS. Elements of the type dimref allow to position events dependent on other events. An application might associate synchronization constraints with these relations. If the determination of the dimension specification of events requires complex computations marker functions can be applied. Such a function computes the position or extent of an event on one axis. Events 3 HyTime 7 are organized in event schedules. A FCS may contain any number of event schedules, and each event schedule may contain any number of events. The following example contains the temporal specification of a scenario where an event (event 2) starts 10 seconds after another event (event 1) has started. Further, event 2 has the same length as event 1. We use a HyFunk-element to specify the relative positioning of event 2. HyFunk is a HyTime-defined marker function type that can be used to express simple relations between event extents. The first three lines of the example define this function. The elements %1, %2 in the definition represent parameter that are passed on to the function when it is called. Then, the extent lists which define the position and extent of the events in the FCS are specified. The position and extent of event 1 are directly specified. In the specification of the extent list of event 2 the defined marker function is applied to position the event relative to event 1. The reference to the extent list of event 1 and the delay are the parameter of the function call. The duration of event 2 is specified by a dimref-element which extracts the duration of event 1. Finally, the finite coordinate space with the event schedule (evsched) containing the two events is specified. (Further examples are found in [Erfl94].) @sum(@first(%1) %2) <\HyFunk> 30 210 The event projection module of HyTime provides the facility to project events from one FCS to another FCS. Thus, event projection might be used to extract a specific part of an object or to modify the presentation speed of an object. The projection is performed by a projector which can be defined in a notation unknown to the HyTime-engine. In this case, the HyTime-engine 4 HyTime techniques to specify interaction 8 asks the application to determine the location and extent of the projected events. Simple projection types can be expressed applying marker functions, such as the projection by a constant ratio. Projectors are organized in schedules called baton. A batrule-element must be used to express the relation of a baton, unprojected event schedules, and projected event schedules. All event parts in the related unprojected schedules that are within the specified projector-scope of the projector are published to the projected event schedules. Additionally, projected event schedules can contain events which are not derived from projections. 3.4 Links The hyperlink module of HyTime provides various link types. Links can be used to describe relations between any kind of objects. HyTime knows two major link types: Contextual links (clinks) describe a relation between two objects. One link-end is the content of the link element and the other link-end is an arbitrary object. Independent links (ilinks) represent a general form of a link. It can have any number of link-ends. With independent links, roles can be defined assigning semantics to anchors. 4 HyTime techniques to specify interaction Documents with interaction abilities contain temporal relations which have to be resolved during rendition. HyTime gives little support to specify relations which cannot be bound to wellknown time points. Thus if interaction should be integrated, HyTime extensions are needed. In this section, we describe some approaches to integrate interaction in HyTime. 4.1 Schedule-link approach In HyTime, events which describe the occurrence of objects in an abstract manner are organized in schedules. Such a schedule determines the presentation for a temporal interval. This interval is normally determined by the start instant of the first event and the end instant of the last event in the schedule. In documents with interaction facilities, multiple alternative renditions are possible. In HyTime, such alternatives can be specified applying a schedule for each particular rendition. Thus each time an interaction occurs, the rendition of the current schedule is aborted, and the rendition continues in the schedule that represents the interaction effect. Enabling interac- 4 HyTime techniques to specify interaction 9 tion in HyTime and a mechanism to switch schedules on interaction are prerequisites of this approach. HyTime does not deal with user input like mouse clicks or key pressings. Nevertheless, possibilities to interact must be offered to the user, such as buttons, keys or slide-bars. Input facilities which have to be displayed on the screen might be specified as HyTime events. We call such events interactive events. Element types for interactive events with a defined semantics can be derived from the AF event. An element type of a simple label-button event might be: The label-attribute determines the text that appears within the button. The linkends-attribute is needed to express relations to following event-schedules. The exspec-attributes describe the position and extent of the button event. The HyTime-attribute specifies that this element type is derived from the AF event. The id-attribute identifies an element of this type. The element type has no content. Hyperlinks are used to define an action which has to be performed if the user applies an interactive event. These hyperlinks relate an interactive element with a schedule that contains the interaction affect. For this purpose, a link element type with special semantics might be derived from the HyTime clink-AF: In the example, the attribute trigger defines the condition which must become true on the interactive event so that the application traverses the link and continues processing in the referenced 4 HyTime techniques to specify interaction 10 event-schedule. Defining links with different trigger conditions relating different schedules, multiple user interaction can be defined with the same interactive event. In [KRRK93], a DTD for a slide show is described allowing to move to the next slide interactively. In this DTD, links have a similar semantics as the link defined above. Figure 4 shows the link connections of the example. Each slide schedule contains a button event and a slide event. The button event is linked to the subsequent schedule. The link is traversed to find the following slide schedule if the button is pressed. Generally, all interaction types introduced in section 2 can be specified by links. For complex interaction forms such as faster and reverse, additional information is needed to position the events. This knowledge must be present in the application if the rendition module is not used. Further, the definition of alternative renditions by different schedules is not applicable with infinite interaction effects. For example, if the presentation speed of a media item can be manipulated by a slider, any speed within a certain range is acceptable. The specification of such a behavior requires additional mechanisms. 4.2 Integration of scripting languages Interaction requires additional processing descriptions within HyTime. Therefore, scripting languages such as HyperTalk might be integrated to describe actions to be executed on interaction. It is possible to define element types for scripts which can be added to any DTD by creating new document elements in the appropriate places [BRR94]. Such script elements are treated as media objects which cannot be interpreted by the HyTime-engine, and therefore would be passed on to the processing application for interpretation. The following example [BRR94] shows an element type page which might contain script objects: Figure 4: Generic interaction example button slide button slide slide schedule slide schedule .... butnlink 4 HyTime techniques to specify interaction 11 The attribute HyScript expresses that the AF is not a HyTime AF. The attribute script_type identifies the scripting language. Therefore, multiple scripting languages can be integrated. The content type of the AF is the script and is not parsed by the HyTime-engine. Generally, all interaction affects can be defined using scripts. For example, the application can maintain and control temporal relations apart from the HyTime-engine by including the temporal information within scripts. However, this may lead to consistency problems because HyTime also provides a mechanism to specify synchronization constraints. 4.3 Projection approach In the existing examples, only simple interaction types are considered, e.g. start and stop. Our goal is to extent the existing approaches to be able to specify the interaction forms introduced in section 2. We developed a method to specify interaction using as many facilities of HyTime as possible. Analyzing the effect of the interaction types introduced in section 2 in respect to their specification, the following is observed: The basic interaction start, the linear directed interaction stop and the non-linear interaction selection change the object set currently presented. For new objects appearing as the result of an interaction or the remainder of objects which is rendered different as the result of the interaction events have to be specified that are positioned at the current rendition instant of the time axis. With interaction affecting events representing continuous media items the context of the media item has to be preserved. These effects require that interactive documents contain temporal relations which are resolved during rendition. Therefore, context information is necessary to relate the events to the remaining or new objects. Context information is the current rendition point on the time-axis of a FCS. A derived element type of the AF evsched can be created that causes the application to collect the context information. The application stores the information internally. 4 HyTime techniques to specify interaction 12 The projection facility of HyTime is used to specify the necessary mapping of logical time to real time when positioning events after interactions. The effect of projectors can be described using marker functions. Therefore, a derived marker function type has to be defined which contains scripts that define how to apply the collected context information. To compute the value of a marker represented by such a marker function, the HyTime-engine calls the application. Thus, it is possible to use the context information during the execution of the marker function. References identifying context information are passed on as input arguments to marker functions. To apply this method, late computation of event extents and schedules must be given because the needed context information is only available during the rendition. To demonstrate the method we present an example. Figure 5 shows a scenario with a continuous object where it is possible to skip a part of the object by pressing a button. Because the interaction time point is not known before run-time, the position of the event presenting the remainder of the object is context-sensitive. A document instance may contain the following lines to describe the situation: -- return (current position on the axis %1) + %2 -- <\ConFunk> 0 -100 Figure 5: Projection with jump interaction logical time continuous event jump unprojected event projected events remainder 0 1300 not presented real time projection projection 4 HyTime techniques to specify interaction 13 300 <\HyOp> < extlist id=?upexall?>