- 1 -

Efficiency of Server Task Queueing

for Dynamic Load Balancing

Wolfgang Becker, Rainer Pollak

Faculty report No. 1994 / 9

CR subject classification C.2.4, C.4, D.4.8

Published by the

Institute of Parallel and Distributed High-Performance Systems (IPVR)

Department for Applications of Parallel and Distributed Systems

Faculty for Computer Science

University of Stuttgart

Breitwiesenstr. 20-22, D-70565 Stuttgart, Germany

Wolfgang.Becker@informatik.uni-stuttgart.de
Rainer.Pollak@informatik.uni-stuttgart.de


- 2 -

Efficiency of Server Task Queueing for Dynamic

Load Balancing

Abstract

In this paper we investigate optimal points of time for task assignment 
in dynamic load balancing schemes. Normally final assignment of tasks to 
server queues is made at the latest possible time. The main reason for a 
late assignment is, that a dynamic load balancer can use most recent 
information about system and application state for the decision. In 
general however, assignment can be done at task arrival time, at the 
moment when a processor or server becomes idle, or when significant load 
changes in the system occur. We will elaborate preconditions and 
circumstances for situations, where it is advantageous to assign tasks 
earlier than necessary, i.e. to queue them at the servers. We verify the 
results in an experimental load balancing environment.

Contents

1 Introduction 2

2 General Considerations 3

3 The Analytic Model 4

4 The Experimental Environment 7

5 Measurement Results 8

6 Conclusions 11

7 References 11

1 Introduction

As distributed computer systems become increasingly popular, resource 
sharing among a number of computers or processors within a MIMD-computer 
connected by communication networks becomes practicable and desirable. 
Sharing of computing resources, e.g. processor time, data and hardware 
devices is usually assisted by load balancing mechanisms. Load balancing 
is the process of distributing and redistributing the workload submitted 
to a network of computers to avoid situations where some of the hosts / 
processors are overloaded while others are underloaded. Work and data 
are distributed in a way that exploits all system resources and 
maximizes overall throughput.

Load balancing strategies may be either static or dynamic, While static 
strategies yield schedules based on averaged system characteristics and 
profiles of applications running in the system, dynamic approaches also 
use information about the current system state and application behavior 
at run time. Dynamic policies may be further subdivided into centralized 
and distributed structures and, according to the utilized information, 
into reactive and predictive load balancing strategies. For a detailed 
classification of load balancing approaches we refer to [4] or [8], 
where a hierarchical taxonomy is presented. Distributed schemes usually 
hold tasks only in local server queues and migrate them at arrival time 
[9], in situations of load imbalance [1], [6], [11] or at task 
completion time [13]. So our investigations of final assignment time 
apply especially to centralized, predictive load balancing schemes. 
Several recent publications are concerned with central-

- 3 -

ized load balancing approaches in similar environments [3], [5], [7], 
[10], [12], [15], some of them employing local server queues per 
processor (e.g. [5]). However, their main objective is comparison of 
different load distribution strategies in terms of considered load 
factors, not structural aspects like server task queueing and optimal 
time points for task assignment. Furthermore there is a lack of 
proposals, which are validated by actual application measurements.

2 General Considerations

For a load balancer there are several time points to assign tasks to 
processors or servers. Static schemes commit placement at compilation or 
batch job start-up time, dynamic non-preemptive schemes usually place 
tasks at arrival time. Preemptive dynamic load balancing methods are 
additionally able to correct such assignment decisions by migrating 
tasks, that are currently executing. Although never explicitly 
investigated, dynamic schemes are principally able to defer assignment 
of tasks until the best server could be ascertained and / or until one 
of the suitable servers becomes ready to receive more tasks. One 
possibility in client server structures is to delay assignment until any 
or the best suited server announces that it became idle. Another 
possible way, and this is what we will examine in this paper, is to 
assign tasks even before they can be executed by the receiving server, 
for the server may still have some previously received tasks to process.

For such a strategy to be feasible two requirements must be fulfilled. 
First, there must be a central task queue for each server class. A 
server class is a group of servers having identical functionality. The 
central queues are controlled by the load balancing component, which may 
arbitrarily select tasks from it for assignment. At each server site 
there has to be a local task queue to contain assigned tasks. Servers 
process the tasks in their queue in first come first served order or 
according to their priorities.

The second precondition requires an accurate definition of load 
balancing events and actions triggered by them. In section 4 we will 
mention the events and actions, that our experimental environment offers 
depending on the actually employed load balancing strategy.

- In most publications there exists an event of task arrival, followed 
by the action of immediate assignment. In decentralized schemes this 
means pushing the task to some less loaded server.

- In preemptive schemes task migration is triggered also by the event of 
significant load imbalance. The lightly loaded processor receives a task 
to the most heavy loaded neighbor. If a central task queue exists, load 
changes may also trigger task assignment.

- In receiver initiated schemes each time when a server complets its 
task, an event occurs. This event is followed by receiving a task from 
an overloaded neighbor. Centralized approaches assign it the eldest or 
the best fitting task of the central queue to the underloaded processor 
or server.

Here we briefly look at design alternatives to enable more flexible task 
assignment time points. One possible extension are time-outs: at task 
arrival time the load balancer rates the servers and decides whether to 
assign the task immediately or to establish a time-out. The time-out 
specifies, when assignment must be committed at the latest, if no other 
events yield an assignment decision for this task. Nevertheless we will 
not use time-outs in this paper, because it is rather complex and not 
completely understood. Instead, we employ a flexible event - action 
coupling: no matter, which task, processor or server caused the event, 
the action may assign any set of tasks to arbitrary servers. Furthermore 
we expand load balancing, that it may keep track of the tasks currently 
queued at servers, especially the size of the local task queues. This 
allows to estimate the load at the server and the time until the server 
turns back to idle state. If load balancing considers data

- 4 -

access affinity (see section 5), it may also estimate which data records 
will be present at the server when it has worked off its queue.

Once load balancing is capable of assigning tasks more or less at any 
moment, it is necessary to investigate the advantages and drawbacks 
expected from early assignment. The first improvement is the saving of 
server idle time due to load balancing decision and message passing 
delays. This is especially true in central load balancing schemes and 
shared nothing computer architectures, which we are concerned with. Each 
time a server has completed a task execution and has no further tasks 
queued locally, it must call the load balancing component and wait until 
a task arrives. Note that employing many servers per processor at a time 
helps keeping the cpu busy, but often results in synchronization and 
task switching overhead. Also it complicates load balancing methods 
significantly. Our considerations in principle apply to both execution 
models.

The second advantage expected from early assignment is the reduction of 
load balancing overhead. Load balancing should try to get rid of tasks 
residing in central queues, if their currently favored placement most 
probably will not change anymore. The principle to assign tasks as late 
as possible is based upon the observation, that the load situation may 
change distinctively. Another placement becomes more lucrative during 
the time between the task arrived and the time, when the server starts 
its execution. To exploit this, however, it is required that all task 
ratings in the central queue are updated each time a relevant load 
balancing event occurs. In situations of high parallelism, i.e. when a 
huge set of executable tasks arrive that cannot be served at once, this 
will certainly cause much overhead. Therefore, load balancing must be 
able to estimate the possible load change rates in terms of foreign 
processor load, task completions and data movements. The stability and 
predictability of the system and application behavior then enables to 
assign tasks to servers up to a certain degree in advance, i.e. put them 
into the local task queue.

A third opportunity arising from proper usage of local server task 
queues is less obvious, but nevertheless important. Most load balancing 
concepts cannot exploit knowledge about groups of tasks, whether for 
reasons of simplicity or avoidance of overhead. To detect the most 
promising degree of parallelism, however, it requires to know in 
advance, how many tasks, showing a similar profile or data access 
pattern, will follow. Then load balancing decides whether to set up 
further servers and whether it pays off to distribute data and copies 
among the system. The option of earlier task assignment using local 
queues provides a rather simple way of controlling the degree of 
parallelism. The algorithm for early assignment without explicit 
announcement of task groups to load balancing works as follows: the 
first few tasks will be assigned to the currently available servers, 
which have appropriate local access to the participant data. After a 
short time, however, as their local task queues are growing, they become 
less attractive for further task assignment. Other servers will be 
employed in spite of start-up overhead, data movement costs or inferior 
available processing power. Without server queues, load balancing would 
always decide to keep the next task until the best server becomes free 
again, because this is obviously faster than giving it to someone else.

The obvious drawback of local task queues is the potential of 
inappropriate assignment due to rapidly changing system load or data 
placement. In addition, there is overhead arising from the explicit 
consideration of the tasks residing in the local server task queues.

In the following section we will introduce a simple analytic model to 
illustrate the relevant factors and their correlations concerning task 
assignment time points.

- 5 -

3 The Analytic Model

The underlying system model, as depicted in the left part of Figure 4, 
fulfils the requirements explained above, providing central and local 
task queues. We will not use Markov queues [5] to examine the effects of 
server task queueing but restrict ourselves to straightforward 
calculations. Both ways cannot adequately reflect the real behavior and 
interactions, e.g. differences between load balancing methods. So it is 
necessary to rely on measurements obtained from a real world 
environment, that confirm the benefits and limitations of our 
considerations.

Our analysis bases on the following set of input parameters, which, of 
course, mutually influence each other. They are estimations about 
system, application and load balancing behavior during some homogeneous 
phase of the application run. The first set of parameters can be 
obtained from measurements:

- p (processors) states the number of available processors, which equals 
the number of servers for our considerations.

- ml (message latency) gives the average elapsed time for a message 
passed between the load balancing component and some server including 
protocol overhead.

- tt (task execution time) is the elapsed time for a single task 
execution averaged over all tasks arrived in this phase and all employed 
processors, if executed on a central processor without any parallelism.

- lba (load balancing time per task assignment) estimates the average 
time load balancing spent in reacting on events per assignment of a 
task. This includes server rating for assignment decisions as well as 
system load state collection. Each time load sharing examines a task in 
the central queue this value increases accordingly.

- squ (server queue usage) is the average server queue size effectively 
used by a certain load balancing strategy. This is essentially the 
parameter with which we vary the task assignment time point as discussed 
above.

The second part of parameters are estimations that cannot be simply 
extracted by profiling the system:

- tts (relative task execution time skew) is a factor, which gives the 
maximum increase of task execution time due to processor speed and load 
differences, references to non-local data, etc.

- lbrs (load balancing recognized part of skew) is the portion of the 
system and application inherent skew observed by the strategy. A simple 
strategy, for example, would equally distribute tasks to processors 
although this does not yield maximum throughput.

- if (information falsification per queued task) describes the load 
change and data movement rate, i.e. instability and in-predictability of 
the system state. So we have a factor how much load balancing benefit is 
lost per task, because the information used at assignment time grew old 
while the task is waiting in the server queue. The more tasks in the 
servers local queue exist, the more load balancing benefit will be lost 
for these tasks.

Upon this abstract characterization of system, load balancing and 
application behavior we built a set of equations. They yield some 
overall performance estimation, expressed in the average number of tasks 
finished per unit of time, assuming a sufficiently loaded system (i.e. 
there are mostly enough tasks to work on).

The temporary variables have the following meanings:

- 6 -

- lbov gives the average overhead for the load balancing component per 
task from arrival till the assignment decision,

- qe is the probability for a server to encounter an empty local queue 
after having finished a task.

- qed is the time a server remains idle until it receives the next task, 
provided its local queue is empty.

- etts estimates the actually observed relative task execution time 
skew.

- Finally, ptt gives the average elapsed time between two task 
executions on a processor divided by the effective parallelism.

(EQ 1)

 , (EQ 2)

(EQ 3)

 , (EQ 4)

From these equations we can derive promising application and system 
parameter ranges in combination with load balancing methods, for which 
early task assignment in form of server task queueing is profitable. We 
chose three different levels of load balancing representing stupid, 
simple and more advanced strategies. All three have the following 
parameter settings in common:

- tt = 25 msec, rather fine grained tasks in a workstation environment. 
Note, that non-preemptive load balancing should have fine grained tasks, 
because load skews caused by long running tasks should not happen.

- tts = 5, a task placed on some unsuited processor will run five times 
as long as on average, p = 15, a mid range workstation cluster,

- ml = 700 ?s, software latency in ethernet based tcp communications.

The other input parameters are empirical; they partially depend on the 
load balancing strategy used. The first strategy, called Round Robin 
(RR), simply assigns tasks in round robin fashion. There is no chance to 
detect some execution time skew, lbrs = 0, assignment cost are low, lba 
= 10 ?s and, for no run time information is used, it cannot grow old, if 
= 0.

The second competitor, called First Free (FF), always assigns tasks to 
the first server, which has some space left in its local queue; using 
round robin like above, if more than one is available. We assume 
theoretical skew avoidance of 15%, i.e. lbrs = 0.15, still low 
assignment cost, lba = 50 ?s. The only run time information used are the 
servers' local queue size, which may be viewed as growing old due to 
task size and processor speed variations, so we estimate if = 0.05.

Finally we model a rather advanced strategy, named Data Locality and 
Processor Speed (DLPS). It takes into consideration several items: the 
processor speed and current load, the expected task size, access pattern 
to global data and current data placement; further the time elapsing for 
execution of the tasks currently residing in the local queues. It uses 
complex formulas to rate the server applicabilities for each task under 
the current situation. So it is able to swallow 60% of the execution 
time skew, i.e. lbrs = 0.6, to the debit of assignment cost lba = 2 ms. 
There is a non-negligible factor of information growing old while an 
assigned task is waiting in the local queue, due to load changes and 
data movement, so we set if = 0.08. Note that all three dynamic 
strategies are suboptimal and heuristic.

lbov p l ba
?
squ
-----------------
=

qe m i n tts lbrs 1
+
?
s qu
-------------------------------- 1
,
( )
= qed qe 2 m l l bov +
?
( )
?
=

etts t t s 1 l b r s 1 m i n if squ 1
?
( )
? 1
,
( )
?
( )
?
?
( )
?
=

ptt tts
p------ t t t t ett s
?
2
------------------- qed
+ +
? ?
? �
?
= t h r oughpu t 1 max p t t l bov
,
( )
------------------------------------
=


- 7 -

Figure 4, derived from the equations, compares the strategies at varying 
degree of server queue usage. A queue size of one task means, that load 
balancing may assign a task to an server not before it turned back to 
idle state. While RR throughput slightly increases due to sinking 
probability for servers to run out of tasks, the same effect at FF is 
soon predominated by the loss due to early assignment of tasks. Towards 
large local task queue size all strategies end up in equally 
distributing tasks regardless of the system behavior. While DLPS suffers 
from load balancing overhead caused by repeated rating of tasks residing 
in the central queue and long delays for idle servers at small local 
queue usage, there exists a range of stable maximum throughput, in which 
the effects of increasing server usage, shrinking overhead and 
assignment information getting antiquated, overlap. The right chart 
gives the performance degradation due to load balancing overhead in 
larger scaled systems, p = 25. Here, load balancing poses a bottleneck 
if strategies are complex and server queue usage is small. Some other 
parameter settings show less sensitivity to early assignment, but the 
results we present below prove, that there is significant influence in 
practise.

4 The Experimental Environment

The following results base on experiences with a prototype 
implementation of a dynamic, centralized load balancing environment 
called HiCon [2]. Applications running under HiCon are essentially 
client server structured, which has been established as a valid 
cooperation paradigm for parallel and distributed applications. Servers 
may operate on global data, which is distributed among the system. Load 
balancing is realized as a component responsible for server 
configuration management, task assignment and data movement as indicated 
in the right part of Figure 4. A central task queue per server class as 
well as local task queues at the servers enable the required assignment 
flexibility. Load balancing consists of a set of applicable strategies. 
Each strategy specifies a couple of actions like updating state 
information and task ratings, sending tasks from central into local 
server queues or migrate data partitions between servers. These actions 
are triggered by events like task arrival, end of task execution, data 
movement or resource load state change. The interested reader is 
referred to [2] for a detailed description of the underlying concepts. 
Instead we will summarize the effects obtained with exploitation of 
different local queue sizes below.

Figure 1: Server queue usage vs. throughput for different load balancing 
techniques

25 processors
15 processors

RR

FF

DLPS

RR

FF
DLPS

squ
squ


- 8 -

We observed three different applications solved in parallel under HiCon 
load balancing support at varying usage of local server task queues. The 
server call structures are shown in Figure 4. Two algorithms were 
implemented for the search of the shortest path between certain nodes 
within a given graph [2]. It turns out that the three applications above 
are able to show all effects derived in section 3.

The first one, called Graph, decomposes the search into a client, which 
just starts the search and combines the results, a server class GFind 
which collects the immediately reachable nodes from a given set of start 
nodes and sends the found nodes as call to the third class. This class, 
GReach, maintains a list of currently reached nodes. It inserts a set of 
reached nodes into this list and issues several server class calls to 
GFind, along with a set of nodes, that are worth further investigation. 
The graph is a set of edges, divided into partitions by start node 
ranges, each stored in a separate disk file. Similarly the list of 
reached nodes, a main memory array structure, is partitioned by node 
ranges, but the partitioning ranges may not match the graph 
partitioning. The GFind class further maintains a termination counter, a 
small data partition, which nevertheless tends to be a hot spot due to 
high update rates. The left part of Figure 4 shows the server call 
structure.

The second application, Graphx, solves the search problem with a client 
and one server class only. Here, the client controls the search process 
explicitly, i.e. sends search tasks, receives their results, divides the 
results into several new tasks and charges the server class with these 
new calls. The server class GFindx maintains both the source graph and 
the list of reached nodes, applies found nodes immediately to the reach 
list and yields a set of sorted nodes for further investigation.

The third application, called distributed picture segmentation (Dps) 
[14], is the most complex one, for it solves a real world problem in 
parallel under load balancing support. Segmentation is the process that 
subdivides an image into its constituent regions or objects. A region is 
an area in a picture whose points have a common property (discontinuity 
and similarity of the color values). The algorithm is roughly a split & 
merge technique; the application is realized by three server classes. 
The first server class, called client, is responsible for the 
synchronization of different calls. The second server, named dps_server, 
carries-out the image segmentation and the third, called print_server, 
is responsible for the correct saving of the segmentation result. The 
cooperation structure of the three server classes is shown in the right 
part of Figure 4. At the beginning

Figure 2: Analytic queueing model and HiCon load balancing structure

Figure 3: Server cooperation structure

task & data

server

system load &
application profile &
task rating

load
balancing

server
data

server
data

server
data

task
arrival task
assignment

task completion

server class
call

task
assignment

management

classes
virtual shared data

GClient
GFind
GFind GFind

GReach GReach

print_server

dps_server

client

GClientx

GFindx
GFindx GFindx

dps_server

dps_server


- 9 -

and at the end of the picture segmentation process the application has a 
high degree of parallelism, whereas in the middle of the processing the 
degree is low.

5 Measurement Results

The applications were observed under the configuration shown in Figure 
4. We used four different load balancing strategies and varied the 
available server queue size for each application. Figure 4 summarizes 
the resulting response times. The strategies RR, FF and DLPS(1) have 
been intro-

duced in section 2. The strategy DLPS2 is a heuristic modification of 
DLPS1: it checks at maximum 20 entries in the central task queues per 
load balancing event. So it works faster but less accurate. The analytic 
evaluation of section 3 is repeated at the bottom of Figure 4 to show 
the correlation to the actual measurement results. The y-axis is 
inverted from throughput into elapsed time. All parameter settings are 
the same as in section 3, except for the load balancing decision costs, 
which had to be increased slightly to comply with the measurements.

Figure 4: Measurement system configuration.

Figure 5:Local queue size vs. elapsed time for different applications 
and load balancing strategies.

load balancing

GFind

GReach

GFindx

load balancing

dps_server

load balancing
client

print_server
dps_server
dps_server

28
MIPS 28
MIPS 28
MIPS

28
MIPS

28
MIPS
28
MIPS

60
MIPS

60
MIPS

34
MIPS
34
MIPS

34
MIPS

34
MIPS
34
MIPS
34
MIPS

18
MIPS
18
MIPS

60
MIPS
16
MIPS

Graphx application Dps application Graph application

400

500

600

700

5 10

300

200

100
5 10

600

400

200

5 10 15

elapsed time
(sec)

squ

RR

FF

DLPS1

RR

FF

DLPS2 RR
FF

DLPS1

DLPS2

60

80

100

120

140

5 10 15

elapsed time per task (msec)

analytic model

RR
FF

DLPS

p = 6


- 10 -

The first effect we expected was a huge load balancing overhead at 
complex strategies along with short server queues. In both Graph and 
Graphx application load balancing never became a bottleneck although 
placed onto the smallest workstation. The Dps application however, 
contains a phase in which lots of tasks arrive that cannot be scheduled 
immediately, each of them announcing access to about a thousand global 
data elements. So the DLPS1 strategy spent too much time in re-rating 
all the tasks in the central queue at different events. The right part 
of Figure 4 tells that

servers are mostly waiting for data, because this also involves load 
balancing actions in this HiCon prototype. At local queue sizes beyond 
12 it outperforms the simple strategies and behaves similar to DLPS2, 
which less accurately updates the information.

The loss of load balancing improvements due to aging information is not 
as well visible from the measurements as it was in section 2. In spite 
of server queue usage growing larger, there still remains some 
difference in throughput between the complex and the simple methods. 
There are two reasons for it: first, because measurements where not 
taken during real life multi-user operation, system load and data 
placement changed comparably slow. Second, the load balancing strategies 
are implemented in a way they may use the server queues up to the given 
limit, but we could not force them to really exploit the maximum allowed 
(if it seemed unreasonable to them).

A better usage of the servers in terms of keeping them busy all the time 
and exploiting possible parallelism with server queueing could be 
observed for all three applications. Especially the calls to class 
GReach within the Graph application yielded very small tasks only. This 
class becomes the bottleneck under DLPS1 when driven without local 
queueing. In the case of short running tasks, the delays for getting a 
next task are comparably expensive. As the left part of Figure 4 shows, 
the servers were almost unemployed. Class GFind is the bottleneck for 
all queue sizes beyond 1, which is the main reason for the minimization 
of the execution time. It should be mentioned, that DLPS1 decided to 
employ only one server of class GReach because of its high data update 
rate. Thus it clearly defeats RR for all larger queue sizes, whereas RR 
more often employs both GReach servers at increasing local queue size.

Figure 6: Execution profiles of certain application and load balancing 
behavior.

% idle
% data wait
% busy

GFind server 0

GFind server 1

GFind server 2

GReach server 0

GReach server 1

GFindx servers at queue usage of 4

load balancing processor usage

dps_server profiles

Graph server profiles

time

server usage

GFindx server 0

3

1

4

dps_server 0

dps_server 2


- 11 -

The Graphx application has even more interesting properties: the maximum 
parallelism for both strategies is about 14 tasks, so larger local 
queues could not be exploited. Tasks in Graphx are rather long running 
(about ten seconds on a 34 MIPS processor), so load balancing was never 
overstressed. Note that most other researchers look at even larger tasks 
in the range of minutes or hours, but we are able to further decompose 
such tasks and thus obtain more load balancing benefit. However, 
Graphx's major problem without local queueing was that parallelism 
cannot be exploited, and so the maximum throughput was achieved at a 
queue usage of five (see middle of Figure 4). This is essentially the 
third consideration explained in section 2.

Dps under RR and FF marginally benefits from local task queues, because 
the time to compute the load balancing decision is negligible. So the 
difference of working with or without a local queue is the elapsed time 
for a message between the load balancing component and some servers. The 
more advanced strategies are able to speed up Dps, DLPS1 requiring some 
local queue size. The strategy DLPS2 is - at least for this application 
profile - an improvement of DLPS1, for it checks less tasks of the 
central queues per load balancing event. It approaches the minimum 
execution time when the local task queues are greater than six.

6 Conclusions

We tried to set up some simple straightforward calculations to estimate 
the server queueing effects for a wide range of computing systems, 
applications and dynamic load balancing approaches. The measurement 
results presented above cannot be viewed as a complete evaluation of 
early task assignment advantages. But they demonstrate that there are 
several different dependences on load balancing efficiency.

In principle, allowing large server queues causes no harm as long as the 
load balancing policies are good enough to decide on their own, which 
size can be fruitfully exploited. Many heuristics proposed in the 
literature are still too simple or too restricted in their 
functionality. Local server queues provide some potential to improve 
load balancing without adding much decision or information acquisition 
overhead. Further, adaptive load balancing may adjust the optimum local 
task queue usage based on the system and application factors we 
elaborated in this paper.

7 References

[1] K.Baumgartner, B.Wah, A Global Load Balancing Strategy for a 
Distributed Computer System, Workshop on the Future Trends of 
Distributed Computing Systems in the 1990's, September 1988.

[2] W. Becker, Globale dynamische Lastbalancierung in datenintensiven 
Anwendungen, Fakult�tsbericht 1993 /1, Universit�t Stuttgart, Institut 
f�r Parallele und Verteilte H�chstleistungsrechner, 1993.

[3] F. Bonomi, A. Kumar, Adaptive Optimal Load Balancing in a 
Heterogeneous Multiserver System with a Central Job Scheduler, 
Proceedings Distributed Computing Systems, June 1988.

[4] T. Casavant, J. Kuhl, A Taxonomy of Scheduling in General-Purpose 
Distributed Computing Systems, IEEE Transactions on Software 
Engineering, Vol. 14, No. 2, February 1988.

[5] Y. Chow, W. Kohler, Models for Dynamic Load Balancing in a 
Heterogeneous Multiple Processor System, IEEE Transactions on Computers, 
Vol. 28, No. 5, May 1979.

[6] D. Eager, E. Lazowska, J. Zahorjan, A Comparison of 
Receiver-Initiated and Sender-Initiated Adaptive Load Sharing, 
Performance Evaluation, Vol. 6, March 1986.

[7] K. Efe, B. Groselj, Minimizing Control Overheads in Adaptive Load 
Sharing, Proceedings 9th International Conference on Distributed 
Computing Systems, June 1989.

- 12 -

[8] D. Ferrari, S. Zhou, A Load Index for Dynamic Load Balancing, 
Proceedings Fall Joint Computer Conference, Dallas, Texas, 1986.

[9] C. Hsu, J. Liu, Dynamic Load Balancing Algorithms in Homogeneous 
Distributed Systems, Proceedings Distributed Computing Systems, 1986.

[10] P. Krueger, M. Livny, A Comparison of Preemptive and Non-Preemptive 
Load Distributing, Proceedings Distributed Computing, June 1988.

[11] F. Lin, R. Keller, The Gradient Model Load Balancing Method, IEEE 
Transactions on Software Engineering, Vol. 13, No. 1, January 1987.

[12] H. Lin, C. Raghavendra, A Dynamic Load-Balancing Policy With a 
Central Job Dispatcher (LBC), IEEE Transactions on Software Engineering, 
Vol. 18, No. 2, February 1992.

[13] R. Mirchandaney, D. Towsley, J. Stankovic, Analysis of the Effects 
of Delays on Load Sharing, IEEE Transactions on Computers, Vol. 38, No. 
11, November 1989.

[14] R. Pollak, Lastbalancierte parallele Fl�chenerkennung, Diplomarbeit 
No. 974, University of Stuttgart, March 1993.

[15] P. Yu, A. Leff, Y. Lee, On Robust Transaction Routing and Load 
Sharing, ACM Transactions on Database Systems, Vol. 16, No. 3, 1991.