Task Allocation in Distributed Multimedia Systems based on the Host-Satellite Model Gabriel Dermler, Ashraf Iqbal Abstract. Multimedia applications require intermediate processing between media sources and sinks. In addition to end-user machines intermediate computers can be used for performing media processing. This possibility leads to the problem of allocating processing components on various computers. In this paper, we study this problem in the context of star-shaped application graphs which have to be allocated between given enduser machines (satellites) and a central computer (host). The problem is formulated in terms of best achievable bottleneck resource usage. Several approaches are considered including an approximate scheme and two fast-heuristics. Performance measurements show the efficiency of the considered approaches. A discussion of our approach shows important differences to solutions provided for related problems of graph partitioning and mapping. 1 Introduction Advances in the computer and communication technology have stimulated the integration of digital audio and video with computing, leading to the development of distributed multimedia systems [StNa95]. This class of systems combines the advantages of distributed computing with the ability of processing several media in an integrated fashion. This capability enhances conventional application environments and opens the door for new innovative applications ranging from on-demand teleservices and teleconferencing systems to distributed game and virtual reality applications ([DVV94], [BLMS96]), [DGRO95], [SMK96], [HaSm96]). The value of a distributed multimedia application to its end-users is measured in terms of the delivered Quality-of-Service ([VKBG95]), e.g. the video window size and/or rate. In order to guarantee a requested QoS, the availability of resources such as CPU, memory and communication bandwidth has to be ensured by resource reservation prior to using the application. Resource reservation protocols have been proposed for the coordination of resource allocation for the entire distributed application ([NaSm95], [Wo96], [DFR96], [RDF97]). These protocols attempt to find the best possible QoS under the constraint of limited resource availability. Above protocols assume that the distributed processing tasks of the application, referred to here as application components, are allocated, i.e. there is no degree of freedom in selecting - 2 - processing sites. This assumption is often unavoidable for source and sink components, since their location is determined by the availability of source data and the location of end-users generating respectively consuming source resp. sink data. For instance, in a teleconferencing application, source data will be generated by camera components capturing videos at the sites of the conference participants while (sink) display components will display a mixture of these videos at the same sites. Intermediate processing components (i.e. such processing media streams between sources and sinks) have been proposed for various purposes, e.g. for conversion between different formats (e.g. different encoding schemes , large and small video, high- and low-rate video, etc.; e.g. [YMGH96], [BeSt97], [ShSe95]), media content processing (video/audio smoothing, feature detection and analysis; e.g. [SSJH96], [MAHT97]) and for mixing several media streams into a new stream (e.g. virtual reality, picture-in-picture video, audio mixing; e.g. [HaSm96], [KKKM95], [MXBX96]). Intermediate components are not subject to the location restrictions mentioned above. In fact, several projects aim at developing efficient computer platforms for supporting intermediate processing (e.g. [Sren96], [SSJH96], [AMZ95]), in order to relieve end-user computers from heavy processing load. This paper considers the problem of finding an optimised allocation of intermediate processing components in order to maximize achievable QoS. Its basic model assumes a star shape for applications (Figure 1) and thus covers typical multimedia application scenarios such as multicasting, mixing and conferencing ([DFR96]). In this model source and sink components are attached to satellites (end-user computers) while the component at the star center is attached to a host (intermediate processing computer). Our goal in this paper is to describe algorithms deciding whether intermediate components are to be allocated on satellite or host computers, while taking into account the specific requirements arising for multimedia applications. The paper is structured as follows. In Section 2 we first present our application model in detail and explain the relationship between QoS requests, application load requirements and resource constraints. Based on this, in Section 3 we formulate the task allocation problem for host/satellite scenarios. In Section 4 we describe both approximate and heuristic solution schemes and present measurement results. In Section 5 we discuss our results and relate our approch to earliear work in the area of task allocation for multiprocessing and distibuted systems. The paper concludes with a short summary and an outlook on future work. - 3 - 2 Quality-of-Service and Resource Model Application Abstractions We briefly introduce our abstractions for modeling distributed multimedia applications. Similar concepts have been proposed by other groups including the one defining IMA MMS [IMA93]. Here, we describe the terminology used for our CINEMA platform [RDF97]. In CINEMA, distributed multimedia applications are represented by flow graphs, consisting of components interconnected via links. Components are processing elements encapsulating functions for capturing, storing, presenting and manipulating continuous data streams. They are associated with ports, which allow them to communicate media data. Flow graphs are configured by linking the ports of components, where a link is an abstraction of a unidirectional communication channel. Links can be of both unicast or multicast type. As mentioned, in this paper we focus on star-shaped flowgraphs which only require certain component types for flowgraph construction. We distinguish source components, associated with one output port only, sink components, which only have one input port, and intermediate components, which receive data from input ports, perform some operation and send the result via output ports (see Figure 1). Intermediate components can be of two types. Filters (or con- Figure 1 : A Conferencing Scenario with Mixer and Converters Port Link Component So1 So2 Mix Conv1 Conv3 Sink1 Sink2 Conv2 Conv4 Sat 1 Sat 2 Sat 1 Sat 2 Host Computers - 4 - verters) have one input port and one output port, while mixers may have several input and output ports (see [RDF97] for more complex settings). In CINEMA, media streams are typed. For example, the media type of a stream may be "uncompressed_video", or "JPEG_encoded_video". Each media type defines a set of media parameters, which specifies the characteristics of a particular stream instance. For example, "uncompressed_video" may be associated with parameters, such as frame rate and frame size. For simplicity, in this paper we assume that only one media parameter (e.g. video frame size) is given at each port in the flowgraph. Components may be restricted in their support for media (parameter) values by their functional design. For instance, a source component may only support one media value, though the corresponding media type allows for a set of values. In order to reflect such design limitations1, each component port is associated with a format constraint indicating the possible media values at the port. In this paper, initially we assume that format constraints can be selected individually for sources and sinks, but are the same for all other components.2 Figure 2 shows the format constraints for a flowgraph. It also indicates (encircled) the best possible consistent media value selection at all component ports. For setting the media values under format constraints, two flowgraph properties have to be observed. Firstly, the media value at two interconnected ports has to be the same. The reason is simple: the media data provided at an output port has to be the same like the one consumed through the connected input port. Secondly, media values can vary within a flowgraph. One reason is that processing involving several media streams (done in a mixer) requires different media values for incoming streams. For instance, the mixer in Figure 2 requires that the media value at port 1 is half the value of the value at port 2, since it is supposed to generate a picture-in-picture mixture of the incoming video streams. Another reason is that media conversions are required whenever the media values expected at sink components are different (see end-user QoS requests below) such that media conversions have to take place prior to them (see Figure 2). A similar argument applies to media values provided by source components. Quality-of-Service Model In multimedia systems, QoS implies considerations at various levels of abstraction. We refer to the architecture given in [RDF97] which distinguises an application and a resource level. 1. Note that format constraints do not depend on reource availability. 2. This is done for simplifying descriptions and will be discussed at the end of the paper. - 5 - At the application level, QoS requests (A-QoS) are issued by clients describing the QoS they expect to be delivered. A-QoS has to relate to sink components, since these are responsible for media presentation to end-users. Like in [DFR96], we require an A-QoS to be specified at the input ports of sink components, using the media parameter given there (e.g. video frame size). Also, we allow that for each sink a different request can be issued, thus allowing for ?heterogeneous? receivers ([Zhan93]). In this paper, an A-QoS request is assumed to fix the media value at a sink component's port. Of course, a request has to select a valid value, i.e. one supported by the corresponding sink Figure 2 : Format Constraints and Media Value Selection Fig. 3 : Quality-of-Service Architecture So1 So2 Mix Conv1 Conv3 Sink1 Sink2 Conv2 Conv4 400 400 400 200 100 400 200 100 400 200 100 400 200 100 200 100 100 compocompo- A - QoS resource level application level SFS link T - QoS - 6 - format constraint. As can be seen from Figure 2, fixing the media values at the sinks, automatically fixes the media values to be supported at all component ports in a flowgraph. Note that this fixing is done in such a way that media value reduction is done as closely as possible to the sources. Also note, that fixing the media values for converters may imply different conversion factors (e.g. coverter 1 may convert from 400 to 400, 200 or 100), depending on the requested A-QoS. This is a property of so-called variable converters introduced and motivated in [DFR96]. Resource Requirements Components and links require resources for media data processing resp. communication. A component implies load on several resources, e.g. CPU, memory, DSP processor. Resources are shared among parts of an application and among several applications. In order to guarantee resource availability at run-time, a component has to reserve resource loads in advance. A component's load requirement is given by a tuple (LRT), e.g. (CPU-load, Mem-load, etc.). The LRT figures depend on the media values to be supported by the component at its ports. For links, we distinguish between local and remote types. A local link employs media data passing by reference to a main memory location and hence implies no resource load. A remote link employs a transport system to communicate media data between remote locations. Its resource requirements concern both the two interconnected end-systems and the interconnecting network. We model a remote link as consiting of two objects: a send link and a receving link object. Each of them is treated like an ordinary component with respect to end-system resources, i.e. both of them imply an LRT for their corresponding end-system. An LRT for a link object contains figures for CPU and main memory as well as for bandwidth. The latter can be regarded as the required load on an I/O processor attached to the end-system. The LRT for components depend on the selected location of a component. We do not assume any homogenity between the LRTs required by a component for a host or a satellite computer. This reflects the fact that components may be supported by totally heterogeneous hardware, and that even their functional design may be different. Similarly, we do not assume homogeineity concerning resource types and capacity available on satellites and host. Each satellite is assumed to have its set of resources with its individual capacities. The load description we assume is based on a component related to a location (i.e. a (component, location) pair) indicating the LRT for the specific location. For a link object we require a - 7 - similar description. Note that each link can become a remote link, since it may connect two components allocated on two remote compters. Remote Communication Requirements A remote link does not only imply data transfer between two locations. Multimedia streams are often compressed for communication, in order to reduce the high badwidth requirements (especially of video). Compression is reversed by decompression on the receiving side prior to further processing. Therefore, the send and receive link objects making up a link contain components not just for data transport but also for compression/decompression (Figure 4). The LRT description for a link object has to take both load requirements into account (see below). 3 The Component Allocation Problem Allocation Model The problem definition is based on a star-shaped flowgraph and the mentioned host-satellite model. The flowgraph consists of component chains merging at a central component (mixer). The host-satellite model requires that sources and sinks (i.e. chain ends) are fixed on (not necessarily different) satellite computers, while the central component is fixed on the host computer. Intemediate components form component chains and each chain connects one satellite with the host. The allocation process starts when A-QoS requests have been specified by end-users and media values at all component ports have been derived (as explained above). Given these values, the LRTs for (component, location) pairs are available as well as the LRTs for all send and receive link objects. Based on this data, for each component chain cut two cut LRTs can be computed (Figure 4), one for the load implied on the corresponding satellite (S-LRT), one for the Fig. 4 : Internal Structure of a Remote Link Com pression Data Transport Data transport Decom pression Sending Side Receiving Side Link Object Link Object Remote Link - 8 - load on the host (H-LRT). For the satellite all component LRTs are included up to the cut; in addition the load of the send link object (for the considered cut) is computed and added up. For the host, the procedure includes the LRTs beyond the cut and the LRT for the receive load object of the cut.1 Allocation is constrained by resource availability on satellites, hosts and the interconnecting network. Each satellite and host resource (CPU, memory, DSP processor, I/O bandwidth, etc.) has a capacity limiting the sum of the LTRs implied by a specific allocation of components. The network is assumed to be able to indicate the available bandwidth capacity for the communication channel of any (satellite, host) pair. The allocation problem is then to find the component allocation with the least bottleneck resource usage (BRU). The botlleneck resource, which may be any of the satellite or host resources, is the one which is used by an allocation in relative terms maximally (e.g. 120% of its capacity are required by the allocation and no other resource is used that much). Minimising the BRU is linked to achieving maximal A-QoS in the following way. First, for given A-QoS requests, it finds a feasible allocation if there is one. If there is none (indicated by a BRU of more than 100%), the allocation indicates which of the satellites or the host is mostly overloaded. This gives opportunity to reduce A-QoS requests at the sinks accordingly. If the bottleneck resource is on the host, it makes sense to reduce A-QoS at some or all flowgraph sinks. If the bottleneck resource is on one of the satellites it makes sense to reduce A- QoS for that sink only. In short, the indication of a bottleneck resource indicates a feasible alo- 1. Note that the LRT of chain end-components are not included, since these are fixed in advance Fig. 5 : LRT Calculation for a Chain Cut Conv1 Conv2 Conv3 Satellite Host Chain Cut 1 Source Chain S-LRT for Cut 1 = LRT(Conv1) + LRT(SndLink) H-LRT for Cut 1 = LRT(RcvLink) + LRT(Conv2) + LRT(Conv3) End - 9 - cation if there is one and helps to develop A-QoS reduction policies appropriately (which, however, are out of the scope of this paper). Abstract Problem Formulation We are given a flowgraph consisting of a set CS of c chains. Each chain is assumed to contain (m+1) modules, thus allowing for m possible chain cuts. Each chain is characterized by a satellite identifier chainSat, indicating on which satellite it is anchored and a set of (S-LRT, H- LRT) pairs, one for each cut of the chain. For a chain cut, the S-LRT tuple indicates the LRT implied for chainSat, while H-LRT indicates the LRT implied for the host. Both are given as k-dimensional tuples, since (at most) k resources are assumed for each of the satellites and the host. A component allocation is given by a set of cuts (C) for the c chains. An allocation implies a requested load for each satellite and the host (S-Load resp. H-Load). For a satellite i, S-Loadi is given as the (k-vector) sum of all S-LRTs of cuts in C which belong to chains anchored on satellite i. For the host, H-Load is given as the sum of H-LTRs of all cuts in C. Each satellite and the host is associated with a (k-dimensional) constraint vector (SC resp. HC) indicating the respective resource capacacity constraints. When an allocation is given, for each satellite i the maximum resource usage factor SMRUi can be calculated, which is defined as the highest ratio between the requested load for a satellite resource and the corresponding re- Fig. 6 : Layered Graph Problem Representation Chain 1: SatOfChain 1 Chain c: SatOfChain c S-LRT( load 1, ..., load k ) H-LRT( load 1, ..., load k ) m (here 3) possible cuts per chain - 10 - source constraint. Simlarly, for the host a HMRU can be obtained. The bottleneck resource usage BRU for an allocation is given as the maximum of all SMRUi and HMRU. Our goal is to find an allocation with a minimal BRU. 4 Solution Approaches The problem of partitioning chain-structured programs over a single host and multiple satellite system is discussed in [BOKH88]. A number of other researchers have also attacked this problem using faster exact algorithms or efficient approximation schemes [IQBL91], [IQBL95]. All these researchers work under the constraint that there is a single chain anchored on a satellite and the H-LRT tuple associated with the host is of a single dimension. With multiple metrics the problem is essentially to find an allocation (or a path in a graph) subject to multiple constraints. This is an NP-complete problem [GARY79], and can only be solved by pseudopolynomial algorithms, approximation schemes [IQBL93], [IQBL96], or by using fast heuristics. 4.1 Approximate Solutions In this section we discuss the concept of multiple weighted graphs and multiple sum paths in them. We would present an efficient algorithm for finding the multiple sum path which is useful in solving various task allocation problems in distributed multimedia systems. We first describe a pseudo-polynomial algorithm for finding a Two Sum (TS) path in a layered graph. We shall use the TS path technique to find the Multiple Sum (MS) path again in a layered graph. A layered allocation graph is shown in Fig. 7, it is the same graph as shown in Fig. 6, we have added a start and an end node, and a number of directed edges between two adjacent chains. The start node is connected to all m possible cuts in Chain 1, all m possible cuts in the last Chain c are connected to the end node. Similarly all m possible cuts in Chain i are connected to all m possible cuts in Chain i+1. It is important to note that no cut of Chain i is connected to a cut of any chain other than i+1 (where i is greater than or equal to 1 and less than or equal to c-1). A path between the start and the end vertex passes through exactly one possible cut of each of c chains and thus corresponds to an allocation, there is in fact a one to one correspondence between an allocation and a path in the layered graph. - 11 - 4.1.1 A Multiple Sum Path in a Layered Graph A Two Sum Path In case of a TS path we consider a doubly (FW, SW) weighted graph in which there are only two weights associated with each edge. Thus instead of a single weight on each edge as in traditional graphs we have an ordered pair of weights on each edge. As usual, a path between the start and the end node would be composed of edges, the First Sum (FS) associated with the path would be the sum of the first elements of the ordered pairs of all edges in the path while the Second Sum (SS) would be the sum of the second element of the ordered pairs of all the edges in the path. We further assume that each weight associated with an ordered pair is an integer. The problem is to find a path between the start and the end node such that the FS and the SS associated with the path are constrained by an upper limit which is an integer, let us denote this by L. Note that the number of incoming paths at any cut belonging to layer i would be equal to mi-1, that means the total number of paths between the start and the end node would be proportional to O(mn). We try to restrict these paths at every layer by using the following procedure. Suppose there are two paths arriving at a cut belonging to any layer from the start node. We reject path P1 as Fig. 7 : A Cut Path in the Layered Graph S E layer 1 layer 2 layer c (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) (FW, SW) - 12 - compared to a path P2 provided FS(P1) as well as SS(P1) are larger than or equal to FS(P2) and SS(P2) respectively. Note that we can not reject a path P3 as compared to path P1 if FS(P3)>FS(P1) and SS(P3)