Universität Stuttgart

Fakultät Informatik

System Mechanisms for

Partial Rollback of

Mobile Agent Execution

M. Straßer, K. Rothermel

Bericht 1999/10

Juni 1999

System Mechanisms for

Partial Rollback of Mobile Agent Execution

Authors:

Dipl.-Inform. M. Straßer

Prof. Dr. rer. nat. K. Rothermel

Institut für Parallele und Verteilte

Höchstleistungsrechner (IPVR)

Fakultät Informatik

Universität Stuttgart

Breitwiesenstr. 20 - 22

D-70565 Stuttgart


1 Introduction 2

System Mechanisms for Partial Rollback

of Mobile Agent Execution

Markus Strasser and Kurt Rothermel

Institute of Parallel and Distributed High-Performance Systems
University of Stuttgart, Breitwiesenstr. 20-22, D-70565 Stuttgart, 
Germany
markus.strasser@informatik.uni-stuttgart.de

Abstract: Mobile agent technology has been proposed for various 
fault-sensitive application areas, including electronic commerce, 
systems management and active messaging. Recently proposed protocols 
providing the exactly-once execution of mobile agents allow the usage of 
mobile agents in these application areas. Based on these protocols, a 
mechanism for the application-initiated partial rollback of the agent 
execution is presented in this paper. The rollback mechanism uses 
compensating operations to roll back the effects of the agent execution 
on the resources and uses a mixture of physical logging and compensating 
operations to rollback the state of the agent. The introduction of 
different types of compensating operations and the integration of an 
itinerary concept with the rollback mechanism allows performance 
improvements during the agent rollback as well as during the normal 
agent execution.

Keywords: mobile agents partial rollback fault-tolerance compensation 
itineraries

Technical areas most relevant to this paper:

- Distributed Fault-Tolerant Systems

- Multi-Agent Systems

1 Introduction

Throughout the past years, the concept of mobile agents has drawn a lot 
of attention in both academia and industry. However, only few ?real? 
applications based on mobile agents exist today. The rather early stage 
of currently available agent platforms might be one reason for that. 
Functionscritical for applications, such as security mechanisms, are 
often incomplete or missing entirely. Moreover, only little work has 
been done so far to integrate agent technology with legacy systems, such 
as TP-Monitors and transactional resource managers. In [11], the 
integration of mobile agent technology with transactional technology to 
provide fault-tolerant exactly-once execution of a mobile agent has been 
presented. Based on these results, this paper provides a mechanism for 
the application-invoked partial rollback of the mobile agent execution.

Agents are autonomous objects which may move from node to node to access 
services provided there. Agent execution proceeds in steps, where a new 
step is initiated whenever an agent migrates to the next node. When an 
agent decides to migrate to another node, the agent's code, data and 
execution state is captured and transferred to the next node, where it 
is initiated after arrival.

The use of mobile agents has been proposed for many application areas, 
including electronic commerce, systems management, or active messaging. 
But only the use of protocols like the exactly-once execution protocols 
presented in [11] allows the usage of mobile agent technology in these 
application areas. However, the strict forward recovery of these 
protocols contradicts the autonomy of mobile agents. For situations 
where an abort and restart of the step transaction is not sufficient to 
deal with an error situation (e.g. if the agent lacks the permission to 
access a resource) or where the program logic of the agent detects that 
the current strategy does not lead to the agent's goal, the agent needs 
the ability to initiate a partial rollback of its execution.

In this paper, we will first investigate which types of operations a 
mobile agent performs can be rolled back. Then, we will propose a 
rollback mechanism which uses compensating operations to rollback the 
effects of an agent on the

2 Mobile Agent Model and Execution Model 3

resources used and which uses a mixture of physical logging and 
compensating operations to rollback the state of the agent. This 
mechanism ensures, that the partial rollback will be executed eventually 
even in the presence of non-lasting node and network crashes. To 
increase the performance, an extension of the mechanism is presented 
which allows to prevent unnecessary agent transfers during the rollback. 
Integrating itineraries with the rollback mechanism provides a 
structured way to automatically define agent savepoints and provides the 
possibility to reduce the size of the log necessary for the rollback 
mechanism. The rollback mechanism is currently implemented in Mole 
[1][9], a mobile agent system developed at Stuttgart University.

The remainder of the paper is structured as follows. In the next 
section, we will describe our mobile agent model and execution model. 
Section 3 investigates the different types of compensation. Section 4 is 
dedicated to the mechanism for partial rollback. Section 4.1 introduces 
two different types of private agent data allowing to roll back the 
private agent data belonging to one type using an image of the data 
while the data belonging to the other type has to be compensated using 
compensating operations. Section 4.2 presents the logging mechanism 
necessary for the partial rollback. Section 4.3 introduces a first 
version of the rollback mechanism which is extended in Section 4.4 to 
increase the performance of the algorithm. Related work is discussed in 
Section 5 before the paper concludes with a brief summary.

2 Mobile Agent Model and Execution Model

This section describes our model of mobile agents and theirexecution. 
Mobile agents are autonomous objects, which perform a job on behalf of 
their owner. While performing a job, an agent often has to visit several 
network nodes to access local resources by invoking operations on these 
resources. The set of actions an agent has to perform on a single node 
is called a step and is implemented as a single method of the agent 
object. Which step the agent has to perform on which node and the order 
in which the steps have to be performed is described by an itinerary, 
which may be adapted during the execution of theagent [15]. If an agent 
migrates to anothernode, the agent object with code and all private data 
belonging to the object is captured and transferred to the next node. 
There, it is re-instantiated and the step (i.e. the method implementing 
this step) to be performed on this node is executed.

To provide reliable agent execution, the agent is executed using one of 
the protocols for providing the exactly-once property of mobile agents 
presented in [11]. The basic idea of these protocols is to store the 
agent in stable storage between steps and execute each step of an agent 
inside a transaction, the step transaction. After the start of the step 
transaction, the agent object with all its data and code is read from 
the agent input queue of the node which resides on stable storage, the 
agent is re-instantiated and then the method implementing the step is 
invoked. At the end of the step, the agent object with all its data and 
code is captured and transferred to the next node where it is stored in 
the nodes' agent input queue on stable storage. Then the (distributed) 
step transaction is committed. It is important to note that all accesses 
to local resources are performed within the step transaction. Therefore, 
if the execution of a step aborts, all changes to resources during the 
step transaction are undone automatically and the agent still resides in 
the input queue of the node that executed the aborted step. In the case 
where a step transaction commits, our model is very similar to the saga 
transaction model [4] with the addition, that a temporary (stable) 
savepoint of the program execution (the agent state) is written after 
every step, allowing to abort and restart a step. For situations where 
an abort and restart of the step transaction is not sufficient to deal 
with an error situation (e.g. if the agent lacks the permission to 
access a resource) or where the program logic of the agent detects that 
the current strategy does not lead to the agent's goal, the agent has 
the ability to initiate a partial rollback of its execution. The points 
to which an agent can be rolled back are called agent savepoints, and 
have to be constituted by the agent program logic. Due to the fact that 
most transaction management systems do not support resource savepoints, 
agent savepoints can only be constituted at the end of a step.

Let us illustrate this: Figure 1 shows the execution of steps i through 
i+3 of an agent. The agent constituted a savepoint after step i-1 and is 
currently executing step i+3. If the agent commits this step, the agent 
is written into stable memory (Ai+4 in the figure). However, if the 
agent decides to roll back its execution to the last savepoint, only the 
effects of step transaction Ti+3 can be undone by the transaction 
management. The effects of steps i through i+2 on the resources accessed 
during those steps have to be undone by using compensating transactions 
[8]. We will show in Section 3.2, that the private data state of the 
agent cannot be rolled back by using Ai to resume the execution of the 
agent. Therefore, in contrast to common approaches (e.g. [4]), in our 
approach the private agent state is rolled back as well during the 
compensation transactions.


3 Classification of Compensation Types 4

3 Classification of Compensation Types

The compensation of an operation aims at undoing the semantic effects of 
this operation. However, there are different operation types and not all 
types can be compensated to the same degree. After the introduction of 
some notations, this section introduces different types of compensation 
operations.

3.1 Notations and Definitions

The formal notations and definitions given in this sub-section are 
adopted from [8]. In order to be able to describe the rollback of the 
private data of an agent as well as the rollback of the resources 
accessed by it, we use the notion of the augmented state. The augmented 
state space is the state space of the resources accessed by the agent 
merged with the private data space of the agent object. This enables us 
to describe the execution of a step as a sequence of operations on this 
augmented state. Rolling back several steps of an agent execution 
requires the compensation of the operations executed during these steps. 
This compensation can also be described as a sequence of (compensating) 
operations on the augmented state.

We use the symbol S to denote a state and the symbol f to denote an 
operation. In contrast to [8], operations in our context may read and 
write any number of entities in the augmented state space. A history is 
a sequence of operations which defines a total order among the 
operations as well as a function mapping augmented states to augmented 
states. The notion X=< f1, f2,..., fn > describes a history in which fi 
precedes fi+1, 1? i < n, the notion X=f1?f2?...?fn denotes the function 
mapping augmented states to augmented states defined by the same history 
X. Upper case letters from the end of the alphabet will be used to 
denote the sequence as well as the function a history defines. Two 
histories X and Y are equal (denoted as X?Y), iff for all augmented 
states S, X(S)=Y(S). Two histories X and Y of operations commute, if 
(X?Y)?(Y?X) holds.

We use the symbol T for step transactions. The compensating operations 
for a transaction T are carried out inside a compensation transaction, 
denoted by CT. A transaction T2 is dependent of a transaction T1 if it 
reads data updated by T1. The set of transactions dependent of T is 
denoted as dep(T).

3.2 Types of Compensation

The idea behind compensation is to semantically undo the effects of 
already committed transactions. Unfortunately, compensation of 
operations is not always possible. Whether compensation is possible 
depends not only on the operation itself, but also on the application. 
In this section, different types of compensation are introduced.

Ai+4
Ni+3

Ri+3

read write

execute

access
Ri+3'

Ai+3
Ni+2

Ri+2

read write

execute

access
Ri+2'

Ai+2
Ni+1

Ri+1

read write

execute

access
Ri+1'

Ai Ai+1
Ni

Ri

read write

execute

access
Ri'

Ai: agent state before step i

Ni: node executing step i

Ri: resource state of node i

Ri': Ri after step i

Figure 1: Execution of an agent

savepoint Ti Ti+1 Ti+2 Ti+3

Ti: step transaction i


4 Rollback of the Agent Execution 5

The most comfortable type of compensation operations are those which 
create sound histories. A history is sound iff X(S)=Y(S) with X being 
the history of T, CT and dep(T), Y being the history of dep(T) and S 
being the initial state [8]. In this case, the outcome of dep(T) is not 
influenced by the compensation. If the compensation operations of CT 
commute with each of the operations contained in dep(T), then the 
history X of T, CT and dep(T) is sound. As an example, we consider a 
bank account with the operations deposit(x) and withdraw(x). If the 
account may be overdrawn, these two operations commute and therefore, as 
long as T, CT and dep(T) only use those operations, the histories 
produced are sound. However, this type of compensation rarely occurs in 
real applications. Let us assume that one of the transactions in dep(T) 
uses the current account balance to decide which actions to perform (?if 
I have enough money, then....?). Now we have created a very simple 
transaction that does not commute with deposit(x) and withdraw(x). 
Please note that the definition of soundness implies that T?CT?I (with I 
as the identity operation).

For most applications it is acceptable, that an execution of only the 
dependent transactions dep(T) without T and CT produces different 
results than those produced by T, CT and dep(T) or even that 
(T?CT)(S)?S. If, for example, a transaction T1 tries to buy some goods 
from a shop and the desired good is out of stock because another 
transaction T2 just bought these goods, T1 - as in real life - buys the 
good from another shop. This transaction T1 - buying from the other shop 
- is not affected if T2 is compensated a short time later, even if the 
first shop would now be able to deliver the desired good. The reasons 
for accepting that the consecutive execution of the transaction and the 
compensating transaction does not result in the initial state are 
twofold. First, there are compensation operations which only produce a 
state equivalent to the initial state. If an agent uses digital cash [2] 
contained in its private data to buy some goods and compensates this 
operation, it (hopefully) gets back the same amount of cash. However, 
the representation of this digital cash is only an equivalent 
representation - the digital coins have different serial numbers. 
Second, if, in the same example, the seller of the goods charges a small 
fee for the compensation transaction or only agrees to give a credit 
note to the customer, the agent contains other information than before 
the purchase. In a more complicated scenario, the amount of 
reimbursement may vary with the time passed between purchase and 
compensation. The following policy is but one example: until x hours 
after the purchase, the seller returns cash but charges a small fee, 
after that, the customer only gets a credit note. In these cases, the 
agent must be able to deal with the changed situation.

Until now, it has been assumed implicitly that a compensation 
transaction has to be accomplished successfully eventually. But there 
are cases where a compensation operation might be impossible to execute. 
For example, if transaction T1 deposits 20USD on an account, then CT1 
has to withdraw 20USD. If the account cannot be overdrawn, there have to 
be at least 20USD on the account for CT1 to be successful. If there are 
less than 20USD (e.g. because another transaction T2 withdrew all money 
in the meantime), the compensation transaction fails. Solutions to this 
problem are discussed in [4] and [10].

Finally, there are operations which cannot be compensated. If, for 
example, a transaction deletes a considerable amount of data in a 
database, it would be necessary to log all this data to be able to 
compensate the deletion. Therefore, if a step contains an operation 
which cannot be compensated, the step cannot be rolled back after its 
commit.

4 Rollback of the Agent Execution

Rollback in our model requires the compensation of actions performed on 
resources as well as the compensation of the agent's private data space. 
To support the rollback on the agent's private data space, we first 
identify two different types of agent data, which allows to rollback a 
part of the agent's data space without compensating operations. Then, we 
introduce the mechanism we use for logging and present the rollback 
mechanism. Introducing different types of compensation operations will 
allow the presentation of an optimized rollback mechanism.

4.1 Rollback of the Private Agent Data Space

The data objects contained in the private data space of the agent can be 
classified in two categories. The first category are data objects which 
can be compensated by means of an image of the objects. If a savepoint 
is established, an image of these data objects (before-image [6]) is 
written into the agent rollback log (see next section). If an agent has 
to be rolled back to a savepoint, these objects can be restored using 
the image stored in the log. For example, if an agent

4 Rollback of the Agent Execution 6

collects information and stores this information into a vector, then 
this information can be rolled back to a savepoint without the need of a 
compensating operation. In this case, the vector can be restored using 
the original content contained in the vector at the time the savepoint 
has been taken. We call this type of objects strongly reversible 
objects. Strongly reversible objects (which have to be declared by the 
developer as such) will be restored by the system without the need for a 
compensating operation.

As described in Section 3.2, there are applications which accept 
augmented states produced by the compensation which differ from the 
augmented state produced without the execution of the step and the 
compensation of this step. The second category of data objects contains 
those objects in the private data space of the agent which may contain 
different data after the compensation, i.e. which cannot be compensated 
using before-images. The reason for not being able to use a before-image 
for rollback is that during the agent rollback, information originally 
not contained in the agent's private data space is produced (usually by 
the rollback of the state space of the resources). This new information 
has to be integrated into the private agent data. An example showing 
that electronic cash belongs into this category of data is the scenario 
presented in Section 3.2 where an agent orders some (digital) good using 
electronic cash. We call the objects contained in this second category 
weakly reversible objects. These objects cannot be compensated using a 
before-image of the objects, i.e. the agent developer has to provide 
code for the compensation.

4.2 Logging

The data necessary for the rollback of previous, committed steps of an 
agent is contained in the agent rollback log. It contains all 
information for the rollback of the private data space of the agent as 
well as for the rollback of the resource state space. The log is 
attached to the agent and hence migrates with the agent from node to 
node. Because the log only contains data for the compensation of already 
committed steps (backward recovery), this log is made persistent at 
transaction commit (i.e. at the end of step transactions or end of 
compensating transactions).

The advantages of attaching the log to the agent are twofold. Firstly, 
at the end of the agent execution, no global actions are necessary to 
delete the log. Secondly, the log is always available as long as the 
agent is available, enabling the agent to roll back its execution as 
long as the resources necessary for the rollback are available. The 
problem of this approach is, that the amount of data which has to be 
transferred to migrate the agent increases. A solution to this problem 
is introduced in Section 4.4.2.

The type of logging used in our approach is a mixture of physical 
logging and logical logging [6]. The images of the strongly reversible 
objects are logged using physical logging. This can be done either by 
writing a complete image of the objects into the log (state logging) or 
by writing differences of the object states between adjacent savepoints 
(transition logging). These informations to restore the strongly 
reversible objects are written to the log as part of a savepoint entry 
(SP). A savepoint entry is written to the log when an agent savepoint is 
constituted. Besides the image of the strongly reversible objects a 
savepoint entry contains a (unique) identifier for the savepoint. For 
reasons of simplicity, we further assume that state logging is used 
unless mentioned otherwise.

For the compensation of the weakly reversible objects as well as the 
state space of the resources, logical logging is used by writing the 
compensating operations and their parameters into the log. These entries 
will be called operation entries. An operation entry contains the code 
of one compensating operation and the parameters for this operation. As 
has already been mentioned in Section 3.1, the compensation of a step 
can be described as a sequence of operations on the augmented state. The 
compensating operations can be arbitrarily complex. Therefore, the 
number of compensating operations contained in the log which are 
associated to a step may vary from one (complex) operation, which 
compensates all the effects of the step on the weakly reversible objects 
and the state space of the resources, to several times the number of 
operations performed by the step if some of those operations need 
several compensating operations.

In addition to the savepoint and operation entries, the log contains 
entries to log the begin and the end of a step (begin-of-step (BOS), 
end-of-step (EOS)) These entries contain the identifier of the node on 
which the step has been executed. Figure 2 shows an extract of a 
rollback log. It contains the entries of the k-th agent savepoint which 
is located before step n and for the compensation of the n-th step. To 
rollback to this k-th agent savepoint, the log has to be executed 
beginning from the end of the log up to the savepoint entry. To 
compensate a step, all operation entries associated with this step are 
executed within a transaction in the reverse order they appear in the 
log. For example, to compensate

4 Rollback of the Agent Execution 7

step n, the compensation operations are executed in the order OEn,p, 
OEn,p-1, ..., OEn,2, OEn,1. The next sections will present the details 
of the rollback process.

4.3 The Rollback Mechanism

The idea of the rollback mechanism is to perform all compensating 
operations associated with a step on the node where the step has been 
executed. The steps are compensated in the reverse order of their 
execution. Similar to the execution of steps, the compensation 
operations associated with a step are executed within a compensation 
transaction. This ensures that other transactions see either a resource 
state affected by the step which has to be compensated or the resource 
state after the compensation has taken place (isolation of the 
compensation). The state of an agent between two compensation 
transactions is stored in stable storage. Until the savepoint is reached 
to which the agent has to be rolled back, only the changes to the weakly 
reversible objects of the agent and the changes to the state space of 
the resources accessed by the agent are compensated. The state of the 
strongly reversible objects is restored when the savepoint is reached 
using the information contained in the savepoint entry in the log. 
Hence, accessing the strongly reversible objects during the execution of 
the compensating operations is not allowed.

Figure 3 shows the execution of steps i through i+3 of an agent and the 
rollback initiated in step i+3 to the savepoint established before step 
i. After the abort, the transaction management undoes the changes 
performed during step i+3 to the resource state space and the agent 
space. Then the compensation transactions are executed on the nodes in 
the reverse order of the step execution. It is important to note that 
the strongly reversible objects are not restored until the savepoint is 
reached. If a compensating operation e.g. on node Ni+1 accessed the 
strongly reversible objects it would read the (?old?) state established 
by step i+3.

Figure 4 shows the rollback algorithm. In Figure 4a, the part of the 
algorithm executed on the node where the rollback has been initiated 
(current node) is shown. This algorithm gets the identifier of the 
savepoint (spID) to which the agent has to be rolled back as parameter. 
After the abortion of a step transaction (hereafter called the aborting 
step transaction), a new transaction is initiated and the agent and the 
agent rollback log are read (and deleted) from stable storage. Please 
note that the state of the agent and the rollback log read from stable 
storage is the state before the execution of the aborting step 
transaction. Now two cases have to be distinguished. The first case is 
that the desired savepoint was set directly before the aborting step 
transaction. In this case, the rollback is already finished and the next 
step transaction has to be initiated. In the second case, the first 
compensation transaction has to be initiated. This is done by writing 
the agent, the agent rollback log and the savepoint identifier spID to 
the input queue of the node where the first compensation transaction has 
to be executed. This node can be determined by examining the last 
end-of-step entry contained in the agent rollback log (which is the last 
entry if no savepoint entry has been written after the last end-ofstep 
entry). Then the transaction is committed. If this transaction commits 
successfully, then the second part of the algorithm given in Figure 4b 
is executed (or the next step transaction is initiated). If the 
transaction fails (e.g. due to a node failure), the agent and the log 
still resides in the input queue of the current node. In this case, the 
step which initiated the abort is restarted on the current node or, if 
the fault tolerant protocol for the execution of steps [11] is used,

... SPk BOSn OEn,1 OEn,2 ... OEn,p EOSn BOSn+1 ...

Eq Eq+1 Eq+2 Eq+3 Eq+p+1 Eq+p+2 Eq+p+3

Ex - log entry

SPx - savepoint entry

BOSx - begin-of-step entry

OEx - operation entry

Figure 2: Example Log

EOSx - end-of-step entry


4 Rollback of the Agent Execution 8

it may be even restarted on another node. This is still a correct 
execution since it has the same result as if the step transaction was 
aborted before the initiation of the rollback.

Figure 4b shows the part of the rollback algorithm executed on the nodes 
where the compensation transactions have to be executed. After the start 
of the compensation transaction, the agent, the agent rollback log and 
the savepoint identifier (spID) of the savepoint to which the agent has 
to be rolled back are read (and deleted) from stable storage. First, the 
end-of-step entry of the step which has to be compensated and, if 
existent, a savepoint entry are deleted from the log (LOG.pop() reads 
and removes the last entry from the agent rollback log). This savepoint 
entry can be deleted because it cannot be the savepoint to which the 
agent has to be rolled back (this is tested before the agent is written 
to stable storage). Then, all operation entries are read from the log 
and the compensating operations contained in those entries are executed 
until the begin-of-step entry is reached (which is also deleted from the 
log). It has to be mentioned, that a savepoint can only be written after 
the execution of a step (see Section 2) and therefore, no savepoint 
entries can be found between an BOS entry and an EOS entry. Now, two 
cases have to be distinguished. If the last entry now contained in the 
log is the savepoint to which the agent has to be rolled back, the 
states of the strongly reversible objects have to be restored using the 
information contained in the savepoint entry (without deleting the 
savepoint entry from the log) and the next step transaction has to be 
initiated. If the last entry is another savepoint or the end-of-step 
entry

''

Ni+1

Ri+1

write read

compensate

compensate

Ri+1
'''

Ni+3

Ri+3

read

execute/rollback

access
Ri+3'

Ai+3
Ni+2

Ri+2

read write

execute

access
Ri+2'

Ai+2
Ni+1

Ri+1

read write

execute

access
Ri+1'

Ai Ai+1
Ni

Ri

read write

execute

access
Ri'

Ai WROi

SROi

Ri+3

Ti Ti+1 Ti+2 Ti+3

abort
''

Ai+3
Ni+2

Ri+2

write read

compensate

compensate

Ri+2
'''

WROi+3

SROi+3

Ai+2

WROi+2

SROi+3

'

'

Ai+3
Ai+2
'

''

Ni

Ri

write read

compensate

compensate

Ri
'''

Ai+1

WROi+1

SROi+3

'

'

Ai+1
'

Ai

WROi

SROi

'

'

Ai'

CTi+2
CTi+1
CTi

Ai: agent state before step i

Ni: node executing step i

Ti: step transaction i

Ri : resource state of node i
before compensation
Ri : resource state of node i
after compensation
WROi: state of weakly reversible
objects before step i
WROi: state of weakly reversible objects
after compensation of step i
SROi:  state of strongly reversible objects

'''

''

'

'
CTi: compensation transaction
for step i
Ri: resource state of node i
before step i
Ri: Ri after step i

Figure 3: Partial Rollback of an Agent with the Rollback Mechanism

savepoint

Ai: agent state after compensation
of step i
'


4 Rollback of the Agent Execution 9

of the next step which has to be compensated, the agent, the agent 
rollback log and the savepoint identifier spID are written to the input 
queue of the node where the next compensation transaction has to be 
executed (determined from the end-of-step entry contained in the log). 
Finally, the compensation transaction is committed. If the commit is 
successful, the effects of the compensating operations invoked on the 
resources are permanent and the new agent state (reflecting the 
compensating operations performed on the weakly reversible objects) is 
contained in the input queue of the node where the next compensation 
transaction has to be executed or, if the desired agent savepoint is 
reached, the next step transaction is initiated. If the compensation 
transaction aborts (node failures, deadlocks,...), the effects of 
compensating operations invoked on the resources are undone by the 
transaction management and the agent (including log and savepoint 
identifier) still resides in the input queue of the node enabling the 
algorithm to restart this compensation transaction.

Discussion: Thealgorithm presented in this subsection rolls back an 
agent to an agent savepoint by moving the agent back along the way it 
moved during the execution of the steps which have to be rolled back. On 
each node, the compensating operations to compensate the resource state 
of the node as well as the state of the weakly reversible objects are 
executed within a compensation transaction. Assuming that node crashes 
and network crashes are only temporary and further assuming, that the 
network provides reliable data transfer, the algorithm ensures that all 
steps which have to be rolled back are eventually rolled back and 
finally, the state of the strongly reversible objects is restored as 
well. If, instead of state logging, transition logging is used, the 
state of the strongly reversible objects has to be updated every time an 
agent savepoint entry is read during the rollback process.

The algorithm as presented above has two problems. The first problem is, 
that if a node on which a compensation transaction has to be executed is 
permanently unreachable, the rollback cannot proceed and the agent is 
blocked. A solution to this problem is to provide the information, on 
which nodes the rollback of a step can be performed alternatively (if 
there are any) e.g. in the end-of-step entry. Then a fault-tolerant 
execution of the rollback similar to the faulttolerant step execution 
described in [11] can be realised.

rollback(spID){

abort-transaction();

begin-transaction();

read(agent,LOG) from
stable storage;

if savepoint spID reached{

start next step transaction;

}else{

write(spID,agent,LOG) to
input queue of next node

}

commit-transaction();

}

Figure 4a: Start of the Rollback Algorithm

begin-transaction();

read (spID, agent, LOG) from
node input queue;

if (last log entry is savepoint)

LOG.pop(); // remove savepoint

LOG.pop(); // read end-of-step

entry=LOG.pop(); // next entry

while (entry<>begin-of-step){

execute entry;

entry = LOG.pop();

}

if savepoint spID reached{

restore strongly reversible
objects;

start next step transaction;

}else{

write(spID,agent,LOG) to
input queue of next node;

}

commit-transaction();

Figure 4b: Rollback Algorithm Executed on each Node


4 Rollback of the Agent Execution 10

The second problem of the algorithm is, that the agent is transferred to 
a node during the rollback process although maybe no resource 
compensation has to be performed on this node. If, for example, the 
agent only gathered some information during a step and stored this 
information in strongly reversible objects, no compensating operations 
at all are necessary to roll back this step. A solution to this 
performance problem is introduced in the following subsection.

4.4 Optimizing the Partial Agent Rollback

Two possibilities for optimizations regarding the partial agent rollback 
are to avoid unnecessary agent transfers during the rollback on the one 
hand and to reduce the transfer size of an agent by reducing the size of 
the agent rollback log on the other hand. This section introduces 
mechanisms for these optimizations.

4.4.1 Optimizing the Number of Agent transfers

To be able to decide if the compensation transaction of a step has to be 
executed on the node where the step was executed, the agent rollback log 
has to contain the appropriate information. To allow a flexible and 
efficient rollback, we define three different types of operation 
entries.

Types of Operation Entries

The first type of operation entries contains a compensating operation 
which rolls back only resource state space and which needs no access to 
the private agent state space. All information necessary for this 
compensating operation has to be contained in the operation entry as 
parameters; the compensating operation must not access the private agent 
state space. For example, if an agent invoked a fund transfer between 
two accounts of a bank, all information necessary to compensate this 
fund transfer is the two bank accounts and the amount of money 
transferred between these two accounts. This type of operation entry is 
called a resource compensation entry. Resource compensation entries have 
to be executed on the node on which the resource resides (i.e. on the 
node where the step was executed). Because the compensating operation 
contained in a resource compensation entry is not allowed to access the 
agent, it is possible to send only the operation entry (without the 
agent) to the node where the compensating operation has to be executed 
(within the compensation transaction)

The second type of operation entries contains a compensating operation 
which rolls back only weakly reversible objects (i.e. private agent 
state space) and which needs no access to the resource state space. All 
information necessary for this compensating operation has to be 
contained in the operation entry as parameters and in the weakly 
reversible objects of the private agent state space. As described in 
Section 4.3, the compensating operation may not access the strongly 
reversible objects. We call this type of operation entry agent 
compensation entry. An agent compensation entry always has to be 
performed on the node where the agent resides. Because no resource 
access is allowed, this may be an arbitrary node.

The third type of operation entries contains a compensating operation 
which needs access to the private agent state space (only weakly 
reversible objects, see above) and to the resource state space. An 
example scenario is a step, where the agent changes digital cash from 
one currency into another (e.g. from USD into Euro) at the bank. To 
compensate this (i.e. to change the money back from Euro to USD), the 
compensating operation needs access to the weakly reversible object 
containing the cash in Euro (it cannot be contained in the rollback log, 
see Section 4.1), to the object where the received USD have to be stored 
(it is not possible to restore the digital cash in USD by using a copy 
of the original cash, see Section 4.1 also) and to the resource which 
changes the money. We call this type of operation entry mixed 
compensation entry. To execute a mixed compensation entry, the agent has 
to reside on the node where the step to which the mixed compensation 
entry belongs was executed.

Optimization

As can be seen from the description of the operation entry types above, 
the agent has to be transferred to a node during the rollback only if a 
compensation transaction has to execute a mixed compensation entry 
there. If the operation entries associated to a step are only agent 
compensation entries or resource compensation entries, it is not 
necessary to transfer the agent to the node on which the step was 
executed for the compensation. In this case, the agent compen-

4 Rollback of the Agent Execution 11

sation entries are executed on the node where the agent currently 
resides and the resource compensation entries are sent to the node where 
the step was executed for execution.

To decide whether an agent has to be transferred to another node to 
perform the next compensation transaction, the agent rollback log must 
be examined. One possibility is to read all entries for the next step 
(i.e. the step which has to be compensated next). The other possibility 
is to include a flag in the end-of-step entry indicating whether a mixed 
compensation entry is contained in the step. In this case, only this 
end-of-step entry has to be examined. To integrate this optimization, 
the changes shown in Figure 5 have to be made to the algorithm.
Instead of always writing the agent to the ?next? node, the agent has to 
be transferred only if the flag in the end-of-step entry indicates that 
one of the operation entries which has to be executed in the next 
compensation transaction is a mixed compensation entry.

The loop of the original algorithm executing the operation entries has 
to be replaced completely. If the compensation transaction has to 
execute a mixed compensation entry, all operation entries are executed 
locally in the order defined by the rollback log. In the other case, the 
agent compensation entries can be executed concurrently to the resource 
compensation entries because the definition of those operation entry 
types ensure that they operate on disjoint data. Therefore, the log is 
read and two lists are made - one containing the agent compensation 
entries and the other containing the resource compensation entries 
(always in the order in which the entries have to be executed). Now the 
list containing the resource compensation entries and the identifier of 
the compensation transaction is sent to the resource node. Then, the 
agent compensation entries contained in the other list are executed (in 
the order in which they are contained in the list). On the resource 
node, the resource compensation entries are executed in the order in 
which they are contained in the list. All the operations performed on 
the resource node are performed inside the compensation transaction (by 
using the transaction identifier obtained along with the list). After 
the last resource compensation entry is executed, an acknowledgement is 
sent back to the node where the agent resides. Only then, the 
compensation transaction can be committed.

Discussion: The optimization described in this subsection avoids agent 
transfers for compensation transactions in which no mixed compensation 
entries have to be performed. This reduces the network load during the 
agent rollback, because onlythe resource compensation entrieshave to be 
transferred to the node on which the step being compensated was 
executed. Additionally, the resource compensation entries can be 
executed concurrently to the agent compensation entries which may save 
execution time.

Further optimizations are possible, if the access to resources within 
the mixed compensation entries and the resource compensation entries may 
be performed using RPC. In this case, a performance model similar to 
that introduced in [16] can be used to determine if the agent or the 
resource compensation objects should be transferred to the node where 
the resources reside or if RPC should be used to access the resources.

4.4.2 Reducing the Log Size by Integrating Itineraries and Rollback

Attaching the rollback log to the agent introduces some overhead to the 
migration because the log has to be transferred additionally to the 
agent state. This subsection introduces solutions for reducing the 
rollback log size.

There are two possibilities to reduce the size of the log. The first 
possibility is to reduce the number of savepoint entries contained in 
the log. As described above, savepoint entries are written when an agent 
savepoint is established. This can be influenced by the application 
developer by giving up the possibility to roll back an arbitrary number 
of steps i.e. by not establishing an agent savepoint before each step. 
The second possibility is to discard rollback information which is not 
needed any more. The information which rollback information can be 
discarded has also to be provided by the application developer. In this 
subsection, we will present an itinerary concept which allows the 
application developer to define in a structured way to which points an 
agent can be rolled back and when rollback information can be discarded.

The basic idea is, that an application can be structured into sub-tasks. 
If a rollback is necessary, always the complete sub-task currently 
executed has to be rolled back. To provide enhanced flexibility, each 
(sub-)task may be partitioned into further sub-tasks, allowing 
hierarchies of sub-tasks. In this case, the application developer may 
specify whether only the nested sub-task currently executed has to be 
rolled back or (one of) the surrounding sub-tasks that contains the 
current sub-task. To describe such a hierarchy of sub-tasks, an 
itinerary concept similar to that described in [14]

4 Rollback of the Agent Execution 12

can be used. An itinerary, which describes a (sub-)task, is a set which 
contains itinerary entries and specifies in which order those itinerary 
entries have to be executed. An entry of an itinerary is either a nested 
sub-itinerary or a step entry. A step entry in an itinerary is a tuple 
(meth()/loc) which describes that the agent has to execute the step 
specified

rollback(spID){

abort-transaction();

begin-transaction();

read(agent,LOG) from
stable storage;

if savepoint spID reached{

start next step transaction;

}else{

if (next EOS entry indicates
mixed compensation entry){

write(spID,agent,LOG) to input
queue of next node;

}else{

write(spID,agent,LOG) to input
queue of current node;

}

}

commit-transaction();

}

Figure 5a: Start of the optimized rollback algorithm

begin-transaction();

read (spID, agent, LOG) from
node input queue;

if (last log entry is savepoint)

LOG.pop(); // remove savepoint

EOSEntry = LOG.pop() // read end-of-step

entry=LOG.pop(); // next entry

if (EOSEntry indicates mixed compensat.
entry){ // execution on agent node

while (entry<>begin-of-step){

execute entry;

entry = LOG.pop();

}

}else{ // group operation entries

resourceNode = EOSEntry.execNode();

ACEList = {}; // empty list of agent
// compensation entries

RCEList = {}; // empty list of resource
// compensation entries

while (entry<>begin-of-step){

if (entry is agent compensation
entry){

ACEList.add(entry);

}else{ // res. compensation entry

RCEList.add(entry);

}

entry = LOG.pop();

}

send (TransactionID, RCEList) to
resourceNode; // only if not empty

execute the entries in ACEList in
the order they appear in the list;

wait for ACK from resourceNode;

}

if savepoint spID reached{

restore strongly reversible
objects;

start next step transaction;

}else{

if (next EOS entry indicates
mixed compensation entry){

write(spID,agent,LOG) to input
queue of next node;

}else{

write(spID,agent,LOG) to input
queue of current node;

}

}

commit-transaction();

Figure 5b: Optimized rollback algorithm executed on each node


4 Rollback of the Agent Execution 13

by the method meth() on the node specified by loc. The term ?execute an 
entry e? describes the execution of the step if the entry e is a step 
entry or the (recursive) execution of all entries contained in e if e is 
a sub-itinerary. The order defined between the entries of a 
(sub-)itinerary may be partial, allowing the system to choose which 
entry to execute as the next entry. It is even possible to allow the 
application to specify entries which have to be executed alternatively, 
or to define complex rules which specify under which conditions an entry 
has to be executed. An approach using preconditions for the entries to 
decide whether and when an entry can be executed is presented in [14].

An example of an itinerary (without the definition of the order between 
entries) is given in Figure 6. This itinerary I contains three 
sub-itineraries SI1, SI2 and SI3. The sub-itineraries SI1 and SI2 
contain only step entries, SI3 contains, besides the step entry s6, two 
additional sub-itineraries SI4 and SI5.

To illustrate the integration of rollback and itineraries, we assume the 
following scenario: an agent begins its execution with sub-itinerary 
SI3. It executes the step entry s6 and continues with sub-itinerary SI4 
by executing step entries s5 and s4 (in this order). If it decides to 
roll back during the execution of s4, it can either roll back only 
sub-itinerary SI4 (by aborting step transaction s4 and compensating s 5) 
or it can also roll back the enclosing sub-itinerary SI3 (by 
additionally compensating s6).

In this scenario, only two savepoint entries are necessary - one 
containing the state of the strongly reversible objects before the 
execution of SI3 starts and one before the execution of SI4 starts. 
Those savepoints can be written automatically by the system. If we 
change our scenario slightly and omit the execution of entry s6, i.e. 
the agent begins with the execution of SI3 and immediately continues 
with the execution of SI4, only one agent savepoint is really necessary. 
In this case, the rollback mechanism has to read the savepoint entry, 
but must not delete it from the log as long as only SI4 is rolled back 
(because the savepoint entry is necessary for a possible rollback of 
SI3). To make this visible in the log, a special savepoint entry may be 
inserted (without data forthe strongly reversible objects) for the 
?savepoint? written for SI4. Due to reasons explained later in this 
section, savepoints are never needed for the main itinerary I.

On closer examination it shows, that it is not necessary to have one 
savepoint for each sub-itinerary in the log but only one savepoint for 
the sub-itinerary currently executed and one for each sub-itinerary 
which (directly or indirectly) contains the currently executed itinerary 
(i.e. for each sub-itinerary currently being executed). If, in our first 
scenario from above (agent begins with SI3 by executing s6 and 
continuing with SI4), the sub-itinerary SI4 has been executed and the 
agent now executes the sub-itinerary SI5, it can either rollback SI5 or 
it can rollback SI3, but the rollback of SI5 and SI4 without the 
compensation of s6 is not possible (per definition). The savepoint entry 
written for SI4 is no longer needed. Therefore, the savepoint written 
for a sub-itinerary (but not the operation entries) can be removed from 
the rollback log as soon as the sub-itinerary has been executed 
completely. However, this may be a non-trivial task if transition 
logging is used for the strongly reversible objects.

sub-itinerary

step

I

SI3

s4

s6

SI2
SI1

s5
s7

s1

s8

s2

s3

SI5

s9
s10

itinerary

Figure 6: Sample Itinerary

SI4


5 Related Work 14

Further reductions of the size of the rollback log may be reached by 
discarding rollback information which is no longer needed. Discarding 
rollback information is a far-reaching event in the ?life? of an agent 
which can only be done, if it is sure that the results achieved do not 
have to be rolled back. Achieving such results very likely corresponds 
with the completion of one of the agents main sub-tasks, i.e. with the 
completion of a sub-itinerary directly contained in the main itinerary. 
Therefore, sub-itineraries directly contained in the main itinerary have 
the additional semantics, that, upon their completion, all information 
contained in the rollback log is deleted. To provide a clear semantics, 
no step entries are allowed in the main itinerary. So, if the main 
itinerary of an agent contains n sub-itineraries, the execution of the 
agent is split into n parts, where each of this parts cannot be rolled 
back as soon as that part is completed. It is important to note, that an 
abort of the agent by performing a complete rollback of the agent is 
possible only during the execution of the ?first? sub-itinerary of the 
main itinerary.

Discussion: Theitinerary concept introduced in this subsection providesa 
structured way to automatically constitute agent savepoints and provides 
the possibility to remove an agent savepoint associated with a 
sub-itinerary from the log as soon as the sub-itinerary is finished. 
This reduces the log size without losing the possibility to roll back 
the agent. Additionally, discarding the complete rollback log if a point 
in the agent execution is reached where it is sure that the agent will 
not rolled back beyond this point reduces the log size considerably.

5 Related Work

In the field of mobile agents, only few research groups have considered 
aspects of transaction management and faulttolerance so far. Most of 
those groups provide mechanisms to increase the fault-tolerance of 
mobile agent execution [3][7][12][17] but offer no mechanisms to 
(partially) roll back the agent's execution.

In [13], an agent-based transaction model is presented. Similar to our 
model, the use of compensation to roll back the effects of already 
committed transactions is proposed. However, this paper purely 
concentrates on modelling transactional aspects, protocols or algorithms 
are not given.

In addition, there has been related work in the field of transaction 
processing. Our model is based on [4], where sagas as a transaction 
model for long-living activities are introduced. In a saga, a 
long-living activity is partitioned into several steps. Each step is 
executed within an ACID transaction. The commit of a step automatically 
begins the next step transaction. For each step, a compensation step has 
to specified. The runtime system of a saga guarantees, that eventually, 
either all steps of the saga are committed, or, if the saga has to be 
aborted, for all committed steps the compensation step has been 
executed. To allow backward/forward recovery, savepoints of the program 
state of the transaction program can be taken. However, savepoints are 
only used when a transaction aborts e.g. due to a deadlock or a system 
crash. A partial rollback initiated by the transaction program is not 
supported. Additionally, the use of a savepoint to restore the complete 
execution state of the saga prevents the usage of sagas in applications 
where the execution state of the saga also has to be compensated. 
Extensions of sagas like nested sagas and non vital sub-sagas as 
presented in [5] can be realized in our model by using flexible 
itineraries as described in [14] and Section 4.4.2.

The ConTract model [10] certainly comes closest to our approach. It also 
aims at the exactly-once execution of a task and similar to our approach 
allows the partial rollback of an execution of a task using 
compensation. The main difference to our approach lies in the underlying 
system mechanisms. A ConTract, which is defined by a script is given to 
a ConTract manager, controlling the entire execution of the ConTract. In 
other words, ConTract scripts are not mobile so far.

6 Conclusions and Future Work

We have investigated how the partial rollback of mobile agents which are 
executed using one of the exactly-once protocols presented in [11] can 
be realized. Due to the fact that these protocols use transactions to 
realize the exactlyonce property of mobile agents, compensation is 
necessary to partially roll back an agent's execution. The 
classification of types of compensations showed, that - besides the 
operations performed on resources - the private state of the agent also 
has to be rolled back using compensating operations. Introducing two 
different types of private agent data allowed to roll back the private 
agent data belonging to one type using an image of the data while the 
data belonging to the other type has to be compensated using 
compensating operations. The rollback mechanism presented ensures,

7 References 15

that the rollback is eventually executed even in the presence of 
(non-lasting) node and network crashes. By introducing different types 
of compensating operations and integrating the rollback with an 
itinerary concept, performance optimizations of the rollback mechanism 
were presented. Currently, the protocol is under implementation in the 
Mole system [9] and will be evaluated in terms of performance.

Future work will concentrate on further performance optimizations of the 
rollback mechanism and a fault-tolerant rollback mechanism. Furthermore, 
an enhanced agent execution model supporting exactly-once executions 
comprising more than one agent will be investigated.

7 References

[1] J. Baumann, F. Hohl, K. Rothermel, M. Schwehm, and M. Strasser, 
?Mole 3.0: A Middleware for Java-Based Mobile Software Agents?, Proc. 
Middleware'98, Springer Verlag London, 1998

[2] D. Chaum, ?Security Without Identification: Transaction Systems to 
Make Big Brother Obsolete?, Communications of the ACM, 28(10), October 
1985, p. 1030-1044

[3] M. Dalmeijer, E. Rietjens, M. Soede, D. Hammer, and A. Aerts, ?A 
Reliable Mobile Agent Architecture?, Proc. 1st IEEE Int. Symp. on 
Object-oriented Real-time distributed Computing (ISORC'98), 1998, p. 
64-72

[4] H. Garcia-Molina, and K. Salem, ?SAGAS?, Proc. ACM SIGMOD Int. Conf. 
on Management of Data, 1988, pp. 249-259

[5] H. Garcia-Molina, D. Gawlick, J. Klein, K. Kleissner, and K. Salem, 
?Modeling Long-Running Activities as Nested Sagas?, Bulletin of the IEEE 
Technical Commitee on Data Engineering, 14(1), 1991, pp. 14-18

[6] T. Härder, and A. Reuter, ?Principles of Transaction-Oriented 
Database Recovery?, ACM Computing Surveys, 15(4), December 1983, pp. 
287-317

[7] D. Johansen, R. van Renesse, and F. Schneider, ?Operating System 
Support for Mobile Agents?, Proc. 5th IEEE Workshop on Hot Topics in 
Operating Systems, 1995, pp. 42-45

[8] H. Korth, E. Levy, and A. Silberschatz, ?A Formal Approach to 
Recovery by Compensating Transactions?, Proc. 16th Very Large Data Bases 
Conf., Brisbane, Australia, 1990, pp. 95-106

[9] Project Mole, 
http://www.informatik.uni-stuttgart.de/ipvr/vs/projekte/mole.html

[10] A. Reuter, K. Schneider, and F. Schwenkreis, ?ConTracts Revisited?, 
S. Jajodia and L. Kerschberg (eds.), Advanced Transaction Models and 
Architectures (ATMA), Kluwer Academic Publ., Boston, 1997, pp. 127-151

[11] K. Rothermel and M. Strasser, ?A Fault-Tolerant Protocol for 
Providing the Exactly-Once Property of Mobile Agents?, Proc. 17th IEEE 
Symp. on Reliable Distributes Systems, West Lafayette, IN, October 1998, 
pp. 100-108

[12] F. Schneider, ?Towards Fault-Tolerant and Secure Agentry?, Proc. 
11th Int. Workshop on Distributed Algorithms, 1997

[13] F. Morais de Assis Silva, and S. Krause, ?A Distributed Transaction 
Model Based on Mobile Agents?, K. Rothermel, and R. Popescu-Zeletin 
(eds.), Mobile Agents, Proc. 1st Int. Workshop (MA'97), LNCS 1219, 
Springer, Berlin, 1997, pp. 198-209

[14] M. Strasser, and K. Rothermel, ?Reliability Concepts for Mobile 
Agents?, International Journal of Cooperative Information Systems 
(IJCIS), 7(4), World Scientific, 1998, pp. 355-382

[15] M. Strasser, K. Rothermel, and C. Maihöfer, ?Providing Reliable 
Agents for Electronic Commerce?, W. Lamersdorf, M. Merz (eds.), Trends 
in Distributed Systems for Electronic Commerce (TREC'98), LNCS 1402, 
Springer-Verlag, 1998, pp. 241-253

[16] M. Strasser, and M. Schwehm, ?A Performance Model for Mobile Agent 
Systems?, H. Arabnia (ed.), Proc. Int. Conf. on Parallel and Distributed 
Processing Techniques and Applications (PDPTA'97), Vol II, CSREA, 1997, 
pp. 1132-1140

[17] H. Vogler, T. Kunkelmann, and M. Moschgath, ?An Approach for Mobile 
Agent Security and Fault Tolarance Using Distributed Transactions?, 
Proc. 1997 Int. Conf. on Parallel and Distributed Systems (ICPADS'97), 
1997, pp. 268-274