<=<=
<=<=
<=
<=
<=<=
<=
<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<= <=
<=

<=
<=
<=

<=
<=
<=

<=

<=

<=
<=

<=

<=

<=
<=

<=

<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=
<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=

<=
<=
Universität Stuttgart

Fakultät Informatik

?

Institut für Informatik

Breitwiesenstraße 20-22

D-70565 Stuttgart

Makanin's Algorithm for

Solving Word Equations

with Regular

Constraints

Volker Diekert

Report Nr. 1998/02

March 3, 1998


Abstract

We give a self-contained proof of a fundamental result of Makanin 
(1977), which solves the satisfiability problem of equations with 
constants over free monoids. Our presentation of Makanin's algorithm is 
borrows Schulz (1992a), where Makanin's result is extended to the case 
where solutions are restricted by imposing regular constraints on the 
variables. This report appears (with minor modifications) as a chapter 
of the new book of M. Lothaire Algebraic Combinatorics on Words.


Contents

1 Introduction 1

2 Simple facts on words and on equations 3
2.1 Notations and combinatorial properties . . . . . . . . . . . . . . . 
. . . . . . 3
2.2 Domino Towers . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . 4
2.3 Stable normal forms . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . 5
2.4 From a system to a single equation and vice versa . . . . . . . . . 
. . . . . . 6
2.5 A single variable . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . 6

3 Linear Diophantine equations: Bounding the exponent of periodicity 8

4 Boundary equations 14
4.1 Linear orders over S . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . 14
4.2 From word equations to boundary equations . . . . . . . . . . . . . 
. . . . . . 16
4.3 The convex chain condition . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . 20
4.4 Transformation rules . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . 25

5 Proof of Theorem 4.12 32
5.1 Decidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . . . 32
5.2 Complexity in terms of the semigroup S and the maximal number of 
boundary
equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
. . . . . . . 34
5.3 An upper bound for the complexity of solving word equations . . . . 
. . . . . 36

6 Notes 37

Problems 39

References 40

1 Introduction

The aim of this text is to give a self-contained proof of a fundamental 
result of Makanin which implies that the existential theory of word 
equations over free monoids is decidable. In other terms, Makanin's 
result is the decidability of the existential theory of concatenation. 
Let A be an alphabet of constants and let ? be a set of variables. A 
word equation L = R is a pair (L; R) 2 (A [ ?)? ? (A [ ?)?, and a system 
of word equations is a set of equations fL1 = R1; : : : ; Lk = Rkg. A 
solution is a homomorphism ö : (A [ ?)? ! A? leaving the letters of A 
invariant such that ö(Li) = ö(Ri) for all 1 <= i <= k. A solution is 
henceforth identified with a mapping ö : ? ! A?. It is called 
non-singular, if ö(x) 6= 1 for all x 2 ?. Otherwise it is called 
singular. The satisfiability problem for word equations is to decide 
whether a given word equation has a solution. The problem is usually 
stated for a single equation, but this is no loss of generality. Given a 
propositional formula over word equations all negations can be 
eliminated; and then, passing to a disjunctive normal form, the problem 
of satisfiability can be reduced to a single conjunction. It is 
therefore enough to consider a system of word equations, which in turn 
can be transformed into a single word equation. This way the 
decidability of propositional formulas over word equations can be 
reduced to the

1


satisfiability problem of single word equations. Makanin (1977) yields a 
decision procedure for the satisfiability problem of word equations; and 
the decidability of the existential theory over free monoids follows. 
Since the problem is semi-decidable by its nature, a positive answer 
suffices to compute (if desired) a solution ö : ? ! A? which is minimal, 
say with respect to P
x2? jö(x)j. Here and in the following computation means that there is an 
effective procedure in the mathematical sense. We shall derive a double 
exponential space bound only to solve the satisfiability problem, and 
the length of a minimal solution will be at most four times exponential 
in the input size of the word equation. These upper bounds are far 
beyond any practical meaning, but it is not clear that this reflects the 
inherent complexity of the problem. In practice the algorithm seems to 
behave much better. For example, up to now no solvable word equation is 
known where the minimal solution exceeds exponential length.

Example 1.1 Let A = fa; bg and ? = fx; y; z; ug: Consider the equation

xauzau = yzbxaaby

This equation is solvable, a possible non-singular solution is given by:

ö(x) = abb; ö(y) = ab; ö(z) = ba; ö(u) = bab:

We have

abbababbaabab = ö(xauzau) = ö(yzbxaaby):

There is a rather straightforward algorithm to decide the solvability of 
a system of word equations where each variable occurs at most twice. 
This algorithm is due to Matiyasevich (1968). Since the general solution 
refers (implicitly) to the underlying idea, we explain it here as an 
introductory example. Let E = fL1 = R1; : : : ; Lk = Rkg be a system of 
word

equations where every variable x 2 ? occurs at most twice in E: Let kEk 
= k
P

i=1
jLiRij denote

the denotational length of E: The question is whether there is a 
solution. Using induction on j?j we describe a non-deterministic 
decision algorithm which works without exceeding a linear space bound in 
kEk. The basis ? = ; is clear, hence let ? 6= ;: The first step is then 
to guess whether there is a solution ö : ? ! A? where ö(x) = 1 for some 
x 2 ?: This is done by choosing some x 2 ? and replacing the occurrences 
of x in E by the empty word. We obtain a new system E0 over ? n fxg and 
recursively we decide in non-deterministic linear space whether E0 has a 
solution. Thus, after this step we are looking for non-singular 
solutions of E, only. We may assume that the first equation is either of 
the form

x ? ? ? = a ? ? ? with x 2 ?; a 2 A

or x ? ? ? = y ? ? ? with x 2 ?; y 2 ?; x 6= y:

By symmetry (or a non-deterministic guess to interchange the r^ole of L1 
and R1) we may either write x = az or x = yz, where z is a new variable. 
Replacing all occurrences of x by az or yz respectively, we obtain a new 
system where x does not occur any more and z occurs at most twice. On 
the left of the first equation we may cancel either a or y, and then y 
occurs also at most twice. Hence we end up with a new system E0 where 
the number of variables is the same as in E, every variable occurs at 
most twice and we have kE0k <= kEk. Note that E0 may have a singular 
solution with ö(z) = 1. However, if E0 is solvable, then E is also 
solvable.

2


Now, let ö : ? ! A? be a non-singular solution of E where
P
x2? jö(x)j is minimal. Then we
find a solution ö0 for E0 with jö0(z)j < jö(x)j since ö(y) 6= 1. Thus, 
the length of a shortest solution has decreased, showing that the 
non-deterministic procedure will find a solution, if there is any. Since 
we have a linear space bound, the procedure can be transformed into a 
deterministic decision algorithm of at most exponential time.

The presentation of the general case will mainly follow Schulz (1992a), 
thereby showing the result of Makanin in a more general setting. Assume 
that for every x 2 ? a regular language Lx ? A? is given as part of the 
problem instance together with the equation L = R: Then we can decide 
whether or not there exists a solution ö : ? ! A? satisfying 
additionally the regular constraints ö(x) 2 Lx for all x 2 ?: (For 
example, we can prescribe the alphabet in a solution ö(x) for all x 2 
?:)

In the following we do not focus very much on necessary decidable 
conditions which are useful to prune the search tree. A good pruning 
strategy is of course extremely important for an implementation since 
the search tree tends to be huge. However pruning the tree doesn't help 
to understand the algorithm nor it seems to have any effect on the 
worst-case analysis.

2 Simple facts on words and on equations

2.1 Notations and combinatorial properties

Throughout this chapter A = fa; b; : : : g denotes an alphabet of 
constants and ? is a set of variables (or unknowns) such that A \ ? = ;. 
We shall use the same symbol ö to denote a mapping ö : ? ! A? and its 
canonical extension to a homomorphism ö : (A[?)? ! A?: The symbol 1 
denotes the empty word and the unit element in other monoids and also 
the natural number 1 2 N: The length of a word w is denoted by jwj. The 
prefix relation (proper prefix relation) of words is denoted by u <= v ( 
u < v resp.). Recall that a word p is called primitive, if it cannot be 
written in the form p = rff with ff 6= 1: Lower case Greek letters ff; 
fi etc. are mostly used to denote natural numbers. The set of natural 
numbers is N, it includes zero, the set of integers is Z. By log ff we 
mean maxf1; dlog2 ffeg.
Two words y; z 2 A? are conjugates, if xy = zx for some x 2 A?. The next 
proposition shows that in free monoids conjugates are obtained by 
transposition.

Proposition 2.1 Let x; y; z 2 A? be words, y; z 6= 1. Then the following 
assertions are equivalent:

1. xy = zx;

2. 9 r; s 2 A?; s 6= 1; ff >= 0 : x = (rs)ffr; y = sr; and z = rs:

Proposition 2.2 Let p 2 A? be primitive and p2 = xpy for some x; y 2 A?: 
Then we have either x = 1 or y = 1 (but not both).

Proofs of Props. 2.1 and 2.2 can be found in Lothaire (1983) or 
elsewhere. An overlapping of two words w1 and w2 is depicted by the 
following figure:

w1
w2

3


It says that the common border is an identical factor, i.e., w1 = xy, w2 
= zx. Usually we mean x 6= 1 and sometimes the figure also indicates 
that both y 6= 1 and z 6= 1. But there will be no risk of confusion. For 
example, Prop. 2.2 can be rephrased by saying that the following picture 
is not possible for a primitive word p 2 A?:

p p
p

2.2 Domino Towers

Every non-empty word w 2 A+ can be written in the form w = (rs)h?1r with 
s 6= 1, h >= 2. Then it can be arranged in figure looking like a domino 
tower of height h:

height h

8
>
>
>
>
>
>
>
>
>
>
<

>
>
>
>
>
>
>
>
>
>
:

r s r s r s ? ? ? r s r s r
r s r s r s ? ? ? r s r s r
r s r s r s ? ? ? r s r s r
...
...
...
...
...
...

r s r s r s ? ? ? r s r s r
r s r s r s ? ? ? r s r s r
r s r s r s ? ? ? r s r s r

The position of the vertical line says that the upper left boundary is 
not on the right of the lower right boundary. The formal definition of 
such an arrangement allows also a less regular shape. It is as follows:

Definition 2.3 Let h >= 2. We say that a non-empty word w 2 A+ can be 
arranged in a domino tower of height h, if there are words x1; : : : ; 
xh?1 2 A? and non-empty words y1; : : : ; yh?1; z2; : : : ; zh 2 A+ such 
that

1. w = xiyi = zi+1xi for all 1 <= i < h,

2. jz2 ? ? ? zhj <= jwj.

Note that for h = 2 the domino tower may degenerate as in the following 
figure.

w
w

Definition 2.4 Let w 2 A? be a word. The exponent of periodicity exp(w) 
is defined by

exp(w) = maxfff 2 N j 9r; s; p 2 A?; p 6= 1 : w = rpffsg:

Lemma 2.5 Let w 2 A+ be a non-empty word which can be arranged in a 
domino tower of height h. Then we have exp(w) >= h ? 1.

4


Proof . Choose a domino tower and words xi; yi; zi as in the definition 
above. Let z = zi 2
fz2; : : : ; zhg be of minimal length, x = xi?1, y = yi?1. Then (h ? 
1)jzj <= jwj and we have xy = zx = w. Hence y and z are conjugates and 
we may apply Prop. 2.1. We obtain z = rs and x = (rs)ffr for some ff >= 
0 and jrj < jzj. Hence w = zff+1r and therefore

(h ? 1)jzj <= jwj < (ff + 2)jzj :

Since jzj > 0 we see that h ? 1 <= ff + 1 <= exp(w). 2

2.3 Stable normal forms

One of the key ideas in Makanin's proof is that, given a word equation, 
the exponent of periodicity of a shortest solution has an effective 
upper bound. This relies on the notion of p-stable normal form.

Definition 2.6 Let p 2 A+ be a primitive word. The p-stable normal form 
of the word w 2 A? is a shortest sequence

(u0; ff1; u1; : : : ; ffk; uk)

such that k >= 0 (k is minimal), u0; ui 2 A?; ffi >= 0 for 1 <= i <= k, 
and the following conditions are satisfied:

1. We have w = u0pff1u1 ? ? ? pffkuk.

2. We have k = 0 if and only if p2 is not a factor of w:

3. If k >= 1, then we have:

u0 2 A?p n A?p2A?;

ui 2 (A?p \ pA?) n A?p2A? for 1 <= i < k;

uk 2 pA? n A?p2A?:

Proposition 2.7 Let p 2 A+ be primitive. The p-stable normal form of w 2 
A? is uniquely defined. This means, if (u0; ff1; u1; : : : ; ffk; uk) 
and (v0; fi1; : : : ; fi`; v`) are p-stable normal forms of the same 
word w 2 A?; then they are identical, i.e., we have k = `; u0 = v0; ui = 
vi, and ffi = fii for 1 <= i <= k:

Proof . Assume that (u0; ff1; u1; : : : ; ffk; uk) and (v0; fi1; v1; : : 
: ; fi`; v`) are both p-stable normal forms of w. Since these are 
shortest sequences, the indices k and ` are both minimal, hence k = `.
For k = 0 we have w = u0 = v0, hence let k = ` >= 1.
We show first that u0 = v0. To see this, suppose by symmetry that ju0j 
<= jv0j. Since u0p 2 A?p2 and v0 2 (A?p n A?p2A?), we obtain that u0 <= 
v0 < u0p. By Prop. 2.2 this yields u0 = v0.

Let w0 denote the word u1pff2u2 ? ? ? pffkuk. A simple reflection using 
u1 6= p, Prop. 2.2, and u1 2 (A?p \pA?) n A?p2A? shows that pff1w0 2 
pff1+1A? n pff1+2A?. This implies ff1 = fi1 and w0 = v1pfi2v2 ? ? ? 
pfikvk. Since we have w0 2 pA?, we see that the first component of its 
p-stable normal form is in pA?. Hence (u1; ff2; u2; : : : ; ffk; uk) is 
the p-stable normal form of w0. By induction we conclude (u1; ff2; u2; : 
: : ; ffk; uk) = (v1; fi2; v2; : : : ; fik; vk). Hence the
proposition. 2

5


2.4 From a system to a single equation and vice versa

The existential theory of equations over free monoids is decidable, 
i.e., the satisfiability of any propositional formula over word 
equations (with regular constraints) can be decided. Let us show the 
reduction to Makanin's result. In a first step we may assume that all 
negations in a given formula are of type L 6= R. Due to the following 
proposition these negations can be eliminated.

Proposition 2.8 An inequality L 6= R is equivalent with respect to 
satisfiability to the following formula using x; y and z as new 
variables:

_

a2A
(L = Rax _ R = Lax) _
_

a;b2A
a6=b

(L = xay ^ R = xbz) :

In a second step the formula (without negations) is written in 
disjunctive normal form. Then, for satisfiability, it is enough to see 
how a system of word equations can be transformed into a single word 
equation. The method is given in Prop. 2.9. It relies on the trivial 
fact that if ua <= va; ub <= vb, u; v 2 A?, a; b 2 A, and a 6= b, then 
we have u = v.

Proposition 2.9 Let a; b 2 A be distinct letters (if A = fag, then let b 
denote a new letter resp.) and let E = fL1 = R1; : : : ; Lk = Rkg be a 
system of word equations. Then the set of solutions of E is identical 
(in canonical bijection resp.) with the set of solutions of the 
following equation:

L1a ? ? ? LkaR1 ? ? ? Rk L1b ? ? ? Lkb = R1a ? ? ? RkaL1 ? ? ? Lk R1b ? 
? ? Rkb:

Sometimes it is useful to do the opposite of the proposition above and 
to split a single word equation into a system where all equations are of 
type xy = z with x; y; z 2 A[?. This can be derived from the next 
proposition. Again its (simple) proof is left to the reader.

Proposition 2.10 Let x1 ? ? ? xg = xg+1 ? ? ? xd be a word equation with 
1 <= g < d; xi 2 A [ ? for 1 <= i <= d: Then the set of solutions of L = 
R is in canonical bijection with the set of solutions of the following 
system:

x1 = y1; xg+1 = yg+1;
y1x2 = y2; yg+1xg+2 = yg+2;
...
...

yg?1xg = yg; yd?1xd = yd;
yg = yd:

In the system above y1; : : : ; yd denote new variables.

2.5 A single variable

A parametric description of the set of all solutions can be computed in 
polynomial time, if there is only one variable occurring in the 
equation. This serves as an example of why p-stable normal forms might 
be useful, but it is not used elsewhere. The reader may skip this 
subsection.

6


Let E be a set of word equations where exactly one variable x occurs, ? 
= fxg. By Prop. 2.9 E is given by a single equation L = R with L; R 2 (A 
[ fxg)?: The first check is whether ö(x) = 1 yields the singular 
solution. It is then enough to consider only non-singular solutions. Let 
us denote by L a list of pairs (p; r) where p 2 A+ is primitive and r 2 
A? is a prefix r < p. We say that L is complete for the equation L = R, 
if every non-singular solution ö has the form ö(x) = pffr for some ff >= 
0 and (p; r) 2 L.

Since our intention here is to give an application for p-stable normal 
forms, assume for a moment that a finite complete list L has already 
been computed in a first phase of the algorithm. Then we proceed as 
follows. For each pair (p; r) 2 L we make a first test whether ö(x) = r 
is a solution and a second test whether ö(x) = pr is a solution. After 
that we search (for this pair (p; r)) for solutions where ö(x) = pffr 
with ff >= 2. Replace all occurrences of x in the equation L = R by the 
expression ppff?2pr, where ff now denotes an integer variable. Thus, the 
problem is now to find solutions for ff such that ff >= 2. Using the 
symbolic expression we can factorize L and R in their p-stable normal 
forms:

L = u0pm1ff+n1u1 ? ? ? pmkff+nkuk;

R = v0pm01ff+n01v1 ? ? ? pm0`ff+n0`v`:

Note that k; ` >= 0 and mi; m0
j 2 N; ni; n0
j 2 Z for 1 <= i <= k and 1 <= j <= `. By Prop. 2.7 we
have to verify k = `; ui = vi for 0 <= i <= k and we have to solve a 
linear Diophantine system:

(mi ?m0
i)ff = n0
i ? ni for 1 <= i <= k:

There are three cases. Either no or exactly one ff >= 2 or all ff >= 2 
satisfy these equations.

Note that for each pair (p; r) the necessary computations can be done in 
linear time. The performance of the algorithm depends therefore on an 
efficient computation of a short and complete list L.

We may assume that L = ux ? ? ? and R = xvcldots, where u 2 A+; v 2 A? 
and both words u and v are of maximal length. Let p 2 A+ be the 
primitive root of u, i.e., p is primitive and u = pe for some e >= 1. If 
ö is a solution of L = R, then ö solves also an equation of type ux = xw 
for some word w 2 A+. By Prop. 2.1 it is immediate that we have ö(x) = 
pffr for some ff >= 0 and r < p. Thus, the simple method is to define 
the list L by all pairs (p; r) where r < p. We obtain a list L with kpk 
elements. It is clear that all computations can be done in polynomial 
time. In fact square time is enough.

There is an improvement to an O(kEk log kEk)-algorithm due to Eyono 
Obono, Goralcik, and Maksimenko (1994). This improvement is a clever 
method to compute a complete list L of at most logarithmic length. The 
method uses a finer combinatorial analysis and it relies, in particular, 
on the following facts which can be found in Lothaire (1983):

- Let r; s 2 A?. If the word sr is primitive, then rs is also primitive.

- Let p; q 2 A+ be primitive words and u = pe; w = qf for some e; f >= 
1. If u and w are conjugates, then p and q are conjugates and there is a 
unique factorization p = rs; q = sr with r < p. Moreover, if ux = xw for 
some word x 2 A?, then we have x = pffr for some ff >= 0 and the unique 
prefix r < p above.

- Let p; q; r 2 A+ be primitive words such that p2 < q2 < r2. Then we 
have jpj + jqj <= jrj. In particular, a word w 2 A? of length n has at 
most O(log n) distinct prefixes of the form pp where p is primitive.

7


The aim is to compute a complete list L for the equation L = R of length 
O(jLRj). For this purpose we divide the set of non-singular solutions 
into two classes. The first class contains all solutions where jö(x)j >= 
juj ? jvj. (Of course, in the case juj <= jvj all solutions satisfy this 
condition.) Let w be the prefix of the word vu such that jwj = juj. If ö 
is a solution with jö(x)j >= juj ? jvj, then we have uö(x) = ö(x)w. Let 
p be the primitive root of u and let q be the primitive root of w. Then 
ö(x) = pffr for some ff >= 0 and the unique prefix r < p such that p = 
rs and q = sr. If p and q are not conjugates, then there is no such 
solution. Otherwise, if p and q are conjugates, we include the unique 
pair (p; r) into L. This pair covers all solutions where jö(x)j >= juj ? 
jvj.

Now, let ö be a non-singular solution such that 0 6= jö(x)j < juj ? jvj. 
This implies that R has the form R = xvx ? ? ? and that ö(x)vö(x) < 
uö(x). Hence ö(x)vö(x) < uu and ww < vuu, where w denotes the non-empty 
word vö(x). Let q be the primitive root of w, then we have qq < vuu.

There is a unique factorization q = sr with s < q such that v 2 q?s. the 
word rs is also primitive and we have ö(x) = (rs)ffr for some ff >= 0. 
Therefore it is enough to compute the list of all primitive words q such 
that qq < vuu. If v = 1, then we add all pairs (q; 1) to L. Otherwise, 
if v 6= 1, then we compute for each q the unique factorization q = sr 
with s 6= 1 such that v 2 q?s. We add all pairs (rs; r) to L.

3 Linear Diophantine equations: Bounding the exponent of

periodicity

The input for the algorithm is an equation L = R with L; R 2 (A[?)? 
together with regular languages Lx ? A? for all variables x 2 ?. We are 
looking for a solution ö : ? ! A? such that ö(L) = ö(R) and ö(x) 2 Lx 
for all x 2 ?. For notational convenience we don't distinguish variables 
from constants in the equation henceforth. Every constant a 2 A is 
replaced by a new variable xa and the constraint Lxa = fag for all a 2 
A. (For readability we use constants in examples however.) From now on 
the equation is given as

x1 ? ? ? xg = xg+1 ? ? ? xd

with xi 2 ?. In order to exclude trivial cases we shall assume 1 <= g < 
d whenever convenient. The number d is called the denotational length of 
the equation. It is enough to consider non-singular solutions. Hence we 
shall assume that 1 62 Lx for all x 2 ?. Next we fix a finite semigroup 
S and a semigroup homomorphism ' : A+ ! S such that Lx = '?1'(Lx) for 
all x 2 ?. For later use we demand that ' is surjective. The semigroup S 
can be realized as the image '(A+) of the canonical homomorphism to the 
direct product of the syntactical monoids with respect to Lx for x 2 ?. 
Sometimes it is more convenient to work with monoids instead of 
semigroups. We denote by S1 the monoid, which is obtained by adjoining 
by a unit element 1 to S. We have S1 n f1g = S and the homomorphism ' is 
extended to a monoid homomorphism ' : A? ! S1. We have '?1(1) = f1g and 
'(A+) = S.

Given S we can compute constants t(S) >= 0 and q(S) > 0 such that 
st(S)+q(S) = st(S) for all s 2 S1. In the following we actually use 
another constant c(S), which is defined as the least multiple of q(S) 
such that c(S) >= maxf2; t(S)g. Note that this implies sr+ffc(S) = 
sr+fic(S) for all s 2 S1 and r >= 0 and ff; fi >= 1.

8


Remark 3.1 Assume that each regular language Lx is specified by an NFA 
with rx states, x 2 ?. Let r =
P

x2?
rx. Then we may choose the semigroup S such that

jSj <= 2r2
and c(S) <= r!:

A proof for these bounds can be found in Markowsky (1977), where a more 
precise analysis is given. For the moment explicit upper bounds for jSj 
and c(S) are not used. They are used only later (Sect. 5.3) when 
complexity issues are investigated.

The important point here is that our knowledge from linear algebra how 
to find all minimal solutions of a system of linear Diophantine 
equations yields an effective upper bound for the exponent of 
periodicity of a solution of minimal length of a given word equation 
(with regular constraints). Any effective upper bound would be 
sufficient, but one can do better. The upper bound is exponential in the 
input size, and this is essentially optimal. In the proof below a rather 
detailed analysis is given. So the proof becomes quite technical, which 
might hide the beautiful and simple idea behind it. Readers who are 
mainly interested in the pure decidability result are invited to ignore 
the exact values.

Theorem 3.2 Let d >= 1 be a natural number, ' : A? ! S1 a homomorphism, 
and c(S) >= 2 as above. There is a computable number e(c(S); d) 2 c(S) ? 
2O(d) satisfying the following assertion.

Given as instance a word equation x1 ? ? ? xg = xg+1 ? ? ? xd of 
denotational length d together with a solution ö0 : ? ! A?, we can 
effectively find a solution ö : ? ! A? and a word w 2 A? such that the 
following conditions hold:

1. 'ö0(x) = 'ö(x) for all x 2 ?,

2. w = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd),

3. exp(w) <= e(c(S); d).

Proof . For g = 0 or g = d, we have exp(w) = 0, hence let 1 <= g < d.
Testing all words of length up to jö0(x1 ? ? ? xg)j we find a solution ö 
and a word w such that w = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd) is of 
minimal length among all solutions ö where 'ö0(x) = 'ö(x) for all x 2 ?. 
Recall that x1 ? ? ? xg = xg+1 ? ? ? xd is equivalent to the following 
system:

x1 = y1; xg+1 = yg+1;
y1x2 = y2; yg+1xg+2 = yg+2;
...
...

yg?1xg = yg; yd?1xd = yd;
yg = yd

Note also that exp(w) = exp(ö(yg)). After an obvious elimination of 
variables, the system above is equivalent to a system of d ? 2 equations 
of type

xy = z ; x; y; z 2 ?:

9


Choose a primitive word p 2 A+ such that w = upexp(w)v for some u; v 2 
A?. Consider an equation xy = z from the system above and write the 
words ö(x); ö(y); ö(z) in their p-stable normal forms:

ö(x) : (u0; r1 + ff1c(S); u1; : : : ; rk + ffkc(S); uk);

ö(y) : (v0; s1 + fi1c(S); v1; : : : ; s` + fi`c(S); v`);

ö(z) : (w0; t1 + fl1c(S); w1; : : : ; tm + flmc(S); wm):

The natural numbers ri; si; ti; ffi; fii, and fli are uniquely 
determined by w, c(S), and the requirement 0 <= ri; si; ti < c(S).

Since w is a solution, there are many equations among the words and 
among the integers. For example, for k; ` >= 2 we have u0 = w0, vl = wm, 
r1 = t1, ff1 = fl1, etc. In order to be precise, we shall use:

ff1 = fl1; : : : ; ffk?1 = flk?1;
fi2 = flm?`+2; : : : ; fi` = flm:

We have no bound on k, `, or m, but we have jk + ` ? mj <= 2. What 
exactly happens depends on the p-stable normal form of the product ukv0. 
Since uk; v0 =
2 A?p2A?, it is enough
to distinguish nine cases. Here are the nine possible p-stable normal 
forms of ukv0, where t 2 f0; 1g; uk ; v0 2 A?; and u0
k; v0
0; w0 2 A+:

(ukv0), (p; t; p), (p; t; pv0
0),
(u0
kp; t; p), (u0
kp; t; pv0
0), (p; 0; w0; 0; p),
(p; 0; w0; 0; pv0
0), (u0
kp; 0; w0; 0; p), (u0
kp; 0; w0; 0; pv0
0).

The case (p; 0; w0; 0; p) can be produced, if p has an overlap as in p = 
ababa. Then we might have uk = pabab; v0 = abap, which yields ukv0 = 
ppbap = pabpp and abp = pba. Hence the p-stable normal form ukv0 is (p; 
0; abp; 0; p). We may conclude wk+1 = abp and

tk + flkc(S) = rk + ffkc(S) + 1; tk+1 + flk+1c(S) = s1 + fi1c(S) + 1:

In particular k + ` = m. If rk < c(S) ? 1, then ffk = flk, otherwise ffk 
+ 1 = flk. Similarly, if s1 < c(S) ? 1, then fi1 = flk+1, otherwise fi1 
+ 1 = flk+1.

A p-stable normal form of type (u0p; 0; w0; 0; pv0) with u0; v0; w0 2 A+ 
leads to k+` = m+2 and 0 = flk = flk+1. Let us consider another example. 
If ukv0 = p3, then k + ` = m+ 1 and we have

rk + s1 + 3 + (ffk + fi1)c(S) = tk + flkc(S):

Since by assumption c(S) >= 2, the case ukv0 = p3 leads to the equation:

flk ? (ffk + fi1) = c with c 2 f0; 1; 2g:

We have seen that there are various possibilities for ukv0. However, 
always the same phenomenon arises. First of all we obtain a bunch of 
trivial equations which can be eliminated by renaming. All equations of 
type fl = 0 are eliminated by substitution. Then, for each xy = z either 
there are at most two equations of type fl = ff + 1 or there is one 
equation of type fl ? (ff +fi) = c with c 2 f0; 1; 2g. If there are two 
equations of type fl = ff+ 1, then one

10


of them is eliminated by substitution. So after renaming and 
substituting we end up with at most one non-trivial equation having at 
most three variables. Proceeding this way through all d?2 word equations 
we have various interactions due to renaming and substitution. However, 
finally each equation xy = z leads to at most one non-trivial equation 
with at most three variables. The type of this equation is:

ffl1fl + i1 ? ffl2ff ? i2 ? ffl3fi ? i3 = c

where we have 0 <= i1; i2; i3 <= d ? 2; 0 <= c <= 2; ffl1; ffl2; ffl3 2 
f0; 1g. This can be written as:

ffl1fl ? ffl2ff ? ffl3fi = c0 with jc0j <= 2d ? 2:

This type introduces a coefficient ?2 for ff = fi and ffl1 = ffl2 = ffl3 
= 1.

We have viewed the symbols ff; fi; : : : as variables ranging over 
natural numbers. Going back to the solution ö the symbols ff1; : : : ; 
ffl; fi1; : : : ; fi`; fl1; : : : ; flm represent concrete values which 
are given by the word w. Some of them might still be zero. These are 
eliminated now. The reason is that they cannot be replaced by other 
values without risk of changing the image by '. If ffi >= 1 is a 
remaining value, i.e., a number greater than zero, then we replace it by 
ffi = 1 + Zffi where now Zffi denotes a variable over N. For example an 
equation

fl ? ff ? fi = c0

with ff; fi; fl >= 1 is transformed to a linear Diophantine equation 
with integer variables Zff; Zfi; Zfl >= 0 as follows:

Zfl ? Zff ? Zfi = c0 + 1 with jc0 + 1j <= 2d ? 1:

Putting all equations of type xy = z together we obtain a (perhaps) huge 
system of linear equations. After substitution and elimination of 
variables, we end up with a system of at most d ? 2 equations and n 
integer variables with n <= 3(d ? 2). The absolute values of the 
coefficients are bounded by 2 and that of the constants by 2d ? 1. For 
each equation the sum over the squares over the coefficients is bounded 
by 5. The linear Diophantine system is defined by w and the word w 
provides a non-negative integer solution.

What becomes crucial now is the converse: Every solution in non-negative 
integers yields by backward substitution a word w00 and a solution ö00 : 
? ! A? satisfying (i) and (ii) of the theorem. Therefore: Since w was 
chosen of minimal length, the solution of the integer system given by w 
is a minimal solution with respect to the natural partial ordering of Nn 
. In this ordering we have (ff1; : : : ; ffn) <= (fi1; : : : ; fin) if 
and only if ffi <= fii for all 1 <= i <= n. For ~ff = (ff1; : : : ; ffn) 
2 Nn let k~ffk = maxfffi j 1 <= i <= ng. All we need is a recursive 
bound for the following value:

e(d) = maxfk~ffk j ~ff is a minimal solution of a system of linear 
Diophantine equations with at most d ? 2 equations, 3(d ? 2) variables,
where the absolute value of the coefficients is bounded
by 2, the sum over the squares over the coefficients in
each equation is bounded by 5, and the absolute value
of constants is bounded by 2d ? 1g:

Obviously, there are only finitely many systems of linear Diophantine 
equations where the number of equations, variables, and the absolute 
value of coefficients and constants is

11


bounded. For each system the set of minimal solutions is finite, this is 
a special case of Lemma A in Dickson (1913). Moreover the set of minimal 
solutions is effectively computable. Hence, the set of values of k~ffk 
above is finite and effectively computable. Therefore e(d) is 
computable. Since e(d) + d ? 1 >= ff1; : : : ; fi1; : : : for original 
values under the consideration above, we obtain a recursive upper bound 
for the exponent of periodicity. A much more precise statement is 
possible. It follows from the work of Ko?scielski and Pacholski (1996) 
that e(d) 2 2O(d), see also Rem. 3.3 below. Hence we can state:

exp(w) <= 2 + (c(S) ? 1) + (e(d) + d ? 1) ? c(S) 2 c(S) ? 2O(d):

This proves the theorem. 2

Remark 3.3 The analysis of Ko?scielski and Pacholski (1996) is more 
accurate than the one presented here, and it leads to linear Diophantine 
systems having a slightly different structure. The authors make use of 
the results given in the paper von zur Gathen and Sieveking (1978) They 
show that the exponent of periodicity of a minimal solution of a word 
equation of denotational length d is in O(21:07d). The authors don't 
consider regular constraints, but, as it is shown above, this doesn't 
change the situation very much: It yields the factor c(S). Therefore the 
actual result including regular constraints is:

e(c(S); d) 2 c(S) ? O(21:07d):

It is rather difficult to obtain this very good bound. However, we can 
circumvent this difficulty. A bound which is already nice and certainly 
good enough to establish Thm. 3.2 is e(d) 2 O(2cd) for some constant c, 
say c = 4. Such a more moderate bound can be obtained without any 
difficulty using the present approach and some standard knowledge in 
linear algebra, see Prob. 3.1 below.

Example 3.4 Consider c; n >= 2 and let S = Z=cZ be the cyclic group of c 
elements. We give a regular constraint for the variable x1 by defining

Lx1 = fw 2 A+ j jwj ? 0 (mod c)g:

The system is given by

x1 = ac; x2 = x2
1; : : : ; xn = x2
n?1:

Its unique solution ö is: ö(xi) = ac?2i?1
; 1 <= i <= n: A transformation into a single equation according to 
Prop. 2.9 shows that e(c(S); d) 2 c(S) ? 2?(d): Thus, the assertion 
given in
Thm. 3.2 is essentially optimal.

The following example shows that the length of a minimal solution can be 
very long although the exponent of periodicity is bounded by a constant.

Example 3.5 Let n >= 1. Consider the following well-known system of word 
equations:

x0 = a; y0 = b;
xi = xi?1yi?1; yi = yi?1xi?1 for 1 <= i <= n:

The unique solution is the Thue-Morse word:

ö(xn) = abbabaabbaababbabaababbaabbabaab ? ? ? for n >= 5:

We have jö(xn)j = 2n, but exp(ö(xn)) = 2.

12


Example 3.6 Consider the equation with regular constraints:

axyz = zxay;

Lx = a2a?; Ly = fa; bg? n (a? [ b?); Lz = fa; bg+:

A suitable homomorphism ' : fa; bg+ ! S is given by the canonical 
homomorphism onto the quotient semigroup of fa; bg+, which is presented 
by the defining relations

a2 = a3; b = b2; ab = ba = aab:

Thus, S is a semigroup with a zero, 0 = ab; and S has four elements:

S = fa; a2; b; 0g:

The constant c(S) = 2 fits the requirement sr+c(S) = sr+ffc(S) for all s 
2 S1 and r >= 0; ff >= 1. It is not difficult to find a solution ö for 
the equation above, e.g. ö(x) = a2; ö(y) = ba2, and ö(z) = a3ba2. Now 
let ff, fi, fl, ffi be some integer variables and let u; v; and w be 
parametric words, which are described by the following a-stable normal 
forms:

u : (a; 2ff; a); v : (ba; 2fi; a); w : (a; 1 + 2fl; aba; 2ffi; a):

In order to derive the system of linear Diophantine equations, we make a 
direct approach: We want to solve auvw = wuav. First we write auvw as a 
sequence of a{stable normal forms:

((a); (a; 2ff; a); (ba; 2fi; a); (a; 1 + 2fl; aba; 2ffi; a)):

The resulting a{stable normal form is:

(a; 2ff + 1; aba; 2fi + 2fl + 3; aba; 2ffi; a):

Now consider the right-hand side wuav. This yields:

(a; 2fl + 1; aba; 2ff + 2ffi + 3; aba; 2fi; a):

We obtain the linear Diophantine system:

2ff + 1 = 2fl + 1;

2fi + 2fl + 3 = 2ff + 2ffi + 3;

2ffi = 2fi:

Going back to the equation we see that for all ff >= 0 and fi >= ff the 
mapping

ö(x) = a2+2ff; ö(y) = ba2+2fi ; ö(z) = a3+2ffba2+2fi

yields a solution of the equation axyz = zxay satisfying the regular 
constraints.

13


4 Boundary equations

4.1 Linear orders over S

Let us start with an informal explanation of the notions discussed in 
this subsection. Assume that x1 ? ? ? xg = xg+1 ? ? ? xd ; 1 <= g < d, 
xi 2 ? for 1 <= i <= d is a solvable word equation with regular 
constraints and that there is a non-singular solution ö. The solution is 
given by some word w 2 A+. The equation corresponds to two 
factorizations w = u1 ? ? ? ug = ug+1 ? ? ? ud yielding two sequences of 
non-empty words:

(u1; : : : ; ug); (ug+1; : : : ; ud):

Using the word w these sequences can be merged into a single one such 
that each ui is a product of some wk:

w = (w1; : : : ; wm); wk 6= 1; 1 <= k <= m; m < d:

Let us see what happens if we pass via ' to the finite semigroup S. Two 
sequences (p1; : : : ; pg) 2 Sg and (pg+1; : : : ; pd) 2 Sd?g are merged 
into a single sequence (s1; : : : ; sm) 2 Sm;m < d such that each pi 2 S 
is a product of some sk. We shall say that (s1; : : : ; sm) is a common 
refinement of (p1; : : : ; pg) and (pg+1; : : : ; pd). However, for a 
given input d, there are only finitely many possibilities for sequences 
of the form (s1; : : : ; sm); sj 2 S; 1 <= j <= m < d: Thus, in a 
non-deterministic step we can guess and fix such a sequence which is the 
'-image of (w1; : : : ; wm).

A basic technique of solving word equations is to split a variable. 
Working over the sequence (s1; : : : ; sm) 2 Sm, a splitting of a 
variable x = x0x00 is accompanied with a splitting of some si and a 
guess of s0; s00 2 S such that si = s0s00. In this way the length of the 
sequences is increasing.

Example 4.1 Consider the equation xauzau = yzbxaaby. The solution, which 
was given in Ex. 1.1, leads to the sequences (abb; a; bab; ba; a; bab) 
and (ab; ba; b; abb; a; a; b; ab), where (ab; b; a; b; ab; b; a; a; b; 
ab) is a common refinement. This can be best visualized by the following 
figure.

a b b a b a b b a a b a b

a b b a b a b b a a b a b
a b b a b a b b a a b a b

Passing to the semigroup S from Ex. 3.6 we could start to search for a 
solution with the sequence (0; b; a; b; 0; b; a; a; b; 0) 2 S10.

We now start the formal discussion of this section. The semigroup S and 
the homomorphism ' : A+ ?! S is given as in precedent section. An 
S-sequence is a sequence (s1; : : : ; sm) 2 Sm; m >= 0. A representation 
of (s1; : : : ; sm) is a triple (I; <=; 'I) such that (I; <=) is a 
totally ordered set of m+ 1 elements and

'I : f(i; j) 2 I ? I j i <= jg ! S1

14


is a mapping satisfying for some order respecting bijection ? : I ~
!f0; : : : ; mg the condition

'I(i; j) = s?(i)+1 ? ? ? s?(j) 2 S1 for all i; j 2 I; i <= j:

Note that we have 'I(i; j) = 1 if and only if i = j and 'I(i; k) = 'I(i; 
j)'I (j; k) for all i; j; k 2 I; i <= j <= k.

The standard representation of (s1; : : : ; sm) is simply (I; <=; 'I) 
where I = f0; : : : ; mg and 'I(i; j) = si+1 ? ? ? sj for i; j 2 I; i <= 
j. Hence for the standard representation the bijection ? is the 
identity.
In the following any representation (I; <=; 'I) of some S-sequence is 
called a linear order over S.

Remark 4.2 An S-sequence can be viewed as an abstraction of a linear 
order over S. In most cases we are interested in the abstract objects 
only, but if we work with them we have to pass to concrete 
representations. When counting linear orders over S (c.f. Lem. 4.9 
below), by convention, we count only standard representations and 
mappings between them.

Let w = a1 ? ? ? am 2 A?; ai 2 A for 1 <= i <= m. The set f0; : : : ; mg 
is the set of positions of w, and for 0 <= i <= j <= m let w(i; j) 
denote the factor ai+1 ? ? ? aj. The associated S-sequence of w is 
defined by wS = ('(a1); : : : ; '(am)). The notation wS refers also to 
its standard representation wS = (f0; : : : ; mg;<=;'w). The mapping 'w 
is defined by 'w(i; j) = '(w(i; j)) for all 0 <= i <= j <= m.

Definition 4.3 Let s; s0 be S-sequences given by some representations 
(I; <=; 'I ) and (I 0; <= ; 'I0). We say that s0 is a refinement of s 
(or that s matches s0), if there exists an order respecting injective 
mapping ? : I ! I 0 such that 'I(i; j) = 'I0(?(i); ?(j)) for all i; j 2 
I; i <= j. We write either s <= s0 or, more precisely, s <=? s0 and (I; 
<=; 'I) <=? (I 0; <=; 'I0) in this case.

Remark 4.4 Let s; s0 be S-sequences such that s <= s0. Then we may 
choose concrete representations and a refinement (I; <=; 'I) <=? (I 0; 
<=; '0 I) such that ? : I ! I 0 is an inclusion, i.e.,
I ? I 0 and 'I is the restriction of 'I0 to I.

Definition 4.5 Let s be an S-sequence and (I; <=; 'I) some 
representation. A word w 2 A? is called model of s (of (I; <=; 'I) 
resp.), if the associated S-sequence wS is a refinement of s, i.e. , (I; 
<=; 'I ) <=? wS for some ?.

If w is a model of s, then we write w j= s or w j= (I; <=; 'I ). By 
abuse of language, we make the following convention. As soon as we have 
chosen a word w as a model, we are free to view the set I as a subset of 
positions of w, i.e., ? becomes an inclusion and therefore 'I(i; j) = 
'(w(i; j)) for all i; j 2 I; i <= j.

Lemma 4.6 Every S-sequence (s1; : : : ; sm) has a model w 2 A?.

Proof . Since ' is surjective, there are non-empty words wi 2 A+ such 
that si = '(wi) for all 1 <= i <= m. Let w = w1 ? ? ? wm, then we have w 
j= (s1; : : : ; sm). 2

The lemma above will yield the positive termination step in Makanin's 
algorithm if there are no more variables. In the positive case we can 
eventually reconstruct some S-sequence such that some model w describes 
a solution of the word equation.

15


Let i; j 2 I; i <= j be positions in a linear order over S. Then [i; j] 
denotes the interval from i to j, this is a linear sub-order over S 
which is induced by the subset fk 2 I j i <= k <= jg. More generally, 
let T ? I be a subset, then we view (T ; <=; 'T ) as a linear suborder 
of (I; <=; 'I ). In the following min(T ) and max(T ) refer to the 
minimal respectively to the maximal element of a subset T of a linear 
order I.

Definition 4.7 Let (I; <=; 'I ) be a representation of some S-sequence, 
T ? I a non-empty subset, and `?; r? 2 I be positions such that `? < r?.

An admissible extension of (I; <=; 'I) by T at [`?; r?] is given by a 
linear order (I?; <=; 'I?) and two refinements (I; <=; 'I) <=? (I?; <=; 
'I?) and (T ; <=; 'T ) <=?? (I?; <=; 'I?) such that the following two 
conditions are satisfied:

1. I? = ?(I) [ ??(T ),

2. min(??(T )) = `? and max(??(T )) = r?.

The intuition behind the last definition it should be rather clear. An 
admissible extension refines (I; <=; 'I ) by defining new positions 
between `? and r? until T matches the enlarged interval [`?; r?] in such 
a way that all new points have a corresponding point in T and such that 
min(T ) is mapped to `? and max(T ) is mapped to r?. The other way 
round: Let (I?; <=; 'I?) denote an admissible extension of (I; <=; 'I) 
by T at [`?; r?], then we may view I ? I?, whence T ? I?. There is a 
subset T ? ? I? representing the same S-sequence as T ; and we have I? = 
I [ T ?, min(T ?) = `?, and max(T ?) = r?.

Example 4.8 Let (s1; : : : ; s6) be some S-sequence, (I; <=; 'I) its 
standard representation, `? = 4 and r? = 6. Let (I?; <=; 'I?) represent 
an admissible extension of (I; <=; 'I ) by f0; 3; 4; 5g at [4; 6]. Then 
we may assume I? = f0; : : : ; 6g [ f3?; 4?g with 0 < 1 < 2 < 3 < 4 < 5 
< 6 and 4 < 3? < 4? < 6.
We may or may not have 5 2 f3?; 4?g. Say we have 5 = 3?. Then the 
corresponding S-sequence has the form

(s1; s2; s3; s4; s5; s4; s5)

such that s5 = s1s2s3 and s6 = s4s5.

Lemma 4.9 Given (I; <=; 'I ); T ? I; `?; r? 2 I. Then the list of all 
admissible extensions of (I; <=; 'I) by T at [`?; r?] is finite and 
effectively computable.

Proof . Trivial, since the cardinality of an admissible extension is 
bounded by jIj + jT j. 2

4.2 From word equations to boundary equations

Let x1 ? ? ? xg = xg+1 ? ? ? xd ; 1 <= g < d, xi 2 ? for 1 <= i <= d be 
a word equation with regular constraints Lx ? A? for all x 2 ?. Recall 
we are only interested in non-singular solutions and that we fixed a 
homomorphism ' : A+ ! S to a finite semigroup S such that '?1'(Lx) = Lx 
for x 2 ?. Hence without restriction it holds 1 62 Lx 6= ; for all x 2 
?. Since the images '(Lx) ? S are finite sets we can split into finitely 
many cases where in each case '(Lx) is a singleton. Thus, it is enough 
to consider a situation where the input is

16


x1 ? ? ? xg = xg+1 ? ? ? xd; 1 <= g < d and the question is the 
existence of a non-singular solution ö : ? ! A+ satisfying = ' ffi ö for 
some fixed mapping : ? ! S. The question will be reformulated in terms 
of boundary equations. A system of boundary equations is defined as 
follows.

Definition 4.10 Let n >= 0 and ' : A+ ! S be a homomorphism to a finite 
semigroup S.

1. A system of boundary equations is specified by a tuple

B = ((?;?); (I; <=; 'I ); left; B)

where ? is a set of 2n variables, ? : ? ! ? is an involution without 
fixed points, i.e., x = x, x 6= x, for all x 2 ?, the triple (I; <=; 'I) 
is a linear order over S, left : ? ! I is a mapping, and B is a set of 
boundary equations. Every boundary equation b 2 B has the form b = (x; 
i; x; j) with x 2 ?, i; j 2 I such that left(x) <= i and left(x) <= j.

2. A solution of B is a model w j= (I; <=; 'I), w 2 A?, such that

w(left(x); i) = w(left(x); j) for all (x; i; x; j) 2 B:

(Recall that if a word w 2 A? is a model for (I; <=; 'I), then we view I 
as a subset of positions of w. Hence it makes sense to write w(p; q) for 
p; q 2 I; p <= q.)

3. If B is solvable, then the exponent of periodicity exp(B) of B is 
defined by

exp(B) = minfexp(w) j w is a solution of Bg:

Remark 4.11 If n = 0, then there are no variables, hence no boundary 
equations, and any model w j= (I; <=; 'I) is a solution is a solution of 
B. In particular, if n = 0, then the system is solvable by Lem. 4.6.

Consider a word equation x1 ? ? ? xg = xg+1 ? ? ? xd and a mapping : ? ! 
S. We are going to construct a system

B = ((?;?); (I; <=; 'I ); left; B)

of boundary equations having the following two properties.

1.) Let ö : ? ! A+ be a solution of the word equation such that = ' ffi 
ö, and let v 2 A? be a word with v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). 
Then w = vv is a solution of B.

2.) Let w j= (I; <=; 'I) be a solution of B. Then we have w 2 A?vvA? for 
some v 2 A? and there is a solution of the word equation ö : ? ! A+ such 
that = ' ffi ö and v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd).

In order to define B we start with the S-sequence

( (x1); : : : ; (xd)):

Let (I; <=; 'I) be some representation, I = fi0; : : : ; idg, i0 <= ? ? 
? <= id. The next step is to define the pair (?;?) and the mapping left 
: ? ! I. To this purpose we introduce an undirected

17


graph. Let (V; E) be the undirected graph with vertex set V = f1; : : : 
; dg and edge set E = f(p; q) 2 V ?V j xp = xqg. The idea is that for v 
= ö(x1; : : : ; xg) = ö(xg+1; : : : ; xd) and w = vv we can realize I as 
a subset of positions of w such that both w j= ( (x1); : : : ; (xd)) and 
the following equations hold:

w(i0; ig) = w(ig; id); w(ip?1; ip) = w(iq?1; iq) for all (p; q) 2 E:

For the first equation we shall introduce an extra variable x0 (and its 
dual x0) below; in the other list of equations there is some redundancy. 
For (p; q); (q; r) 2 E, we have by definition (p; r) 2 E, but the 
equations w(ip?1; ip) = w(iq?1; iq) and w(iq?1; iq) = w(ir?1; ir) 
already imply w(ip?1; ip) = w(ir?1; ir). Hence we don't need the edge 
(p; r) for the equation. To avoid this redundancy we let F ? E be a 
spanning forest of (V; E). This means F = F?1, F ? = E?, and (V; F ) is 
an acyclic undirected graph. We have jF j = 2(d ? c), where c is the 
number of connected components of (V; E). The elements of F are called 
variables and for each x = (p; q) 2 F we define its dual and two 
positions left(x); right(x):

x = (q; p); left(x) = ip?1; right(x) = ip:

Note that x 6= x and x = x for all x 2 F . Taking duals corresponds to 
edge reversing in (V; F ). Define two new extra variables x0 and x0 with 
x0 = x0 and define ? = fx0; x0g [ F and:

left(x0) = i0; right(x0) = ig = left(x0); right(x0) = id:

This defines the set ?, the involution without fixed points ? : ? ! ?, 
and the mapping left : ? ! I.
The last step of the construction is to define the set B of boundary 
equations. It should be clear what to do. We define

B = f(x; right(x); x; right(x)) j x 2 ?g:

We have to verify two properties.

1. Let ö : ? ! A+ be a solution such that = ' ffi ö, and let w = vv, 
where v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). The word w has positions 0 
= i0 < i1 < ? ? ? < id, where id is the last position and the following 
equations hold:

w(i0; ig) = w(ig; id); w(ip?1; ip) = ö(xp) for 1 <= p <= d:

In particular, w j= (I; <=; 'I) and w is a solution of B.

2. Let w j= (I; <=; 'I) be a solution of B. Without restriction we may 
view I as a subset of positions of w. Consider the factors w(i0; ig) and 
w(ig; id). The boundary equation (x0; right(x0); x0; right(x0)) 2 B 
implies w(i0; ig) = w(ig ; id) and it follows that w 2 A?vvA? for v = 
w(i0; ig). We define ö : ? ! A+ by ö(xp) = w(ip?1; ip). Since ip?1 < ip, 
this is a non-empty word. The elements (x; right(x); x; right(x)) 2 B 
for x = (p; q), x = (q; p), (p; q) 2 T imply w(ip?1; ip) = w(iq?1; iq) 
whenever xp = xq. Hence ö is well-defined. We have 'ö(xp) = 'w(ip?1; ip) 
= (xp) since w j= (I; <=; 'I ). Finally, v = w(i0; ig) = w(ig; id) 
implies v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd).

18


Thus, the word equation with regular constraints given by the mapping 
has a solution if and only if the system of boundary equations is 
solvable. The construction of the system B above can be performed in 
polynomial time; more precisely, the construction yields a 
logspace-reduction. Due to this reduction, Makanin's result follows from 
the next theorem. The assertion of Thm. 4.12 is in fact equivalent to 
Makanin's result, see Lem. 4.14 below.

Theorem 4.12 It is decidable whether a system of boundary equations has 
a solution.

The rest of this chapter is devoted to the proof of Thm. 4.12. An 
important property is stated in the next proposition: We can bound the 
exponent of periodicity while searching for a solution. Note however 
that bounding the exponent of periodicity of some word gives absolutely 
no bound on the length of this word.

Proposition 4.13 Given as instance a system of boundary equations B, we 
can compute a number e(B) having the property that if B is solvable, 
then we have exp(B) <= e(B).

The proof of Prop. 4.13 could be based on the same techniques as 
presented in Sect. 3. However, for our purposes we prefer to prove Prop. 
4.13 via a reduction to word equations.

Lemma 4.14 There is an effective reduction of the solvability of a 
system of boundary equations B to some word equation with regular 
constraints. Moreover, there is a reduction such that if w 2 A? is a 
solution of the word equation, then B is solvable and we have exp(B) <= 
exp(w):

Proof . Let B = ((?;?); (I; <=; 'I ); left; B) be a system of boundary 
equations. We may assume that the linear order (I; <=; 'I ) is the 
standard representation of its underlying S-sequence s = (s1; : : : ; 
sm). Introduce new variables y1; : : : ; ym with regular constraints 
(yp) = sp, 1 <= p <= m.
For each boundary equation b = (x; i; x; j) 2 B we introduce a word 
equation

yleft(x)+1 ? ? ? yi = yleft(x)+1 ? ? ? yj:

This system of word equations with regular constraints is solvable if 
and only if B is solvable. Indeed, if w 2 A? is a solution of B, then, 
by definition, we have (I; <=; 'I) <=? wS , and ?(I) is a subset of 
positions of w. All word equations

w(?(left(x)); ?(i)) = w(?(left(x)); ?(j))

are satisfied for (x; i; x; j) 2 B. Hence defining ö(yp) = w(?(p ? 1); 
?(p)); 1 <= p <= m yields a solution of the system of word equations.

For the other direction let ö(yp) = vp; 1 <= p <= m be some solution of 
the system of word equations. Due to the regular constraints we have 
(yp) = sp and vp 6= 1 for all 1 <= p <= m. Therefore the word v = ö(y1) 
? ? ? ö(ym) solves B.

Next, we transform the system of word equations into a single word 
equation L = R using Prop. 2.9 and finally we reduce to the word 
equation Ly1 ? ? ? ym = Ry1 ? ? ? ym. The point is that if w is a 
solution of this equation, then some suffix v of w solves B. Hence 
exp(B) <= exp(v) <= exp(w): This yields Lem. 4.14. Now, let d be the 
denotational length of Ly1 ? ? ? ym = Ry1 ? ? ? ym. Then define the 
number e(B) = e(c(S); d), which has been given in Thm. 3.2. We can 
choose w such that exp(w) <= e(c(S); d). This proves Prop. 4.13. 2

19


4.3 The convex chain condition

Let B = ((?;?); (I; <=; 'I ); left; B) be a system of boundary 
equations. A boundary equation b = (x; i; x; j) 2 B is also called a 
brick henceforth. The variable x is called the label of the brick b = 
(x; i; x; j). Pictorially a brick is given as follows:

x i
x j

The dual brick b of b = (x; i; x; j) is given by reversing the brick, it 
has the label x:

x j
x i

Henceforth, we make the assumption that B is closed under duals (i.e., b 
2 B implies b 2 B) and that there is at least one brick b 2 B having 
label x for all x 2 ?. Clearly, this is no restriction. For x 2 ? let 
B(x) ? B be the subset of bricks with label x. Then B(x) = f(x; i1; x; 
j1); : : : ; (x; ir ; x; jr)g for some non-empty subset fi1; : : : ; irg 
? I such that left(x) <= i1 <= ? ? ? <= ir. The right boundary of x is 
defined by by right(x) = ir. Before we continue, we make some additional 
assumptions on B. All of them are necessary conditions for solvability 
and easily verified.
Let (x; i; x; j); (y; i; y; j); (y; i0; y; j0) 2 B. Then we assume from 
now on:

- left(x) <= left(x) if and only if i <= j,

- 'I (left(x); i) = 'I(left(x); j);

- left(x) <= left(y) if and only if left(x) <= left(y),

- i <= i0 if and only if j <= j0.

These assumptions imply that if B(x) = f(x; i1; x; j1); : : : ; (x; ir ; 
x; jr)g is given such that left(x) <= i1 <= ? ? ? <= ir, then we also 
have left(x) <= j1 <= ? ? ? <= jr. In particular, B(x) contains a brick 
(x; right(x); x; right(x)). The set B(x) can be depicted as follows:

B(x) =

(
x i1
x j1
;
x i2
x j2
; : : : ;
x right(x)
x right(x)

)

In our pictures a brick (x; i; x; j) can be placed upon (y; j0; y; k), 
if and only if j = j0. We obtain one of out of three different shapes:

x i
x j
y j

y k

x i
x j
y j

y k

x i
x j
y j

y k

Which one of these cases occurs is determined by the function left : ? ! 
I. The leftmost picture corresponds to left(x) < left(y), the picture in 
the middle corresponds to left(x) = left(y), the picture on the right 
means left(x) > left(y).

20


Definition 4.15 Let m >= 1. A chain C of length m is a sequence of 
bricks

C = ((x1; i1; x1; i2); (x2; i2; x2; i3); : : : ; (xm; im; xm; im+1));

where (xp; ip; xp; ip+1) 2 B for all 1 <= p <= m.
A chain C is called convex, if for some index q with 1 <= q <= m we 
have:

left(xp) >= left(xp+1) for 1 <= p < q;

left(xp) <= left(xp+1) for q <= p < m:

A convex chain C is called clean, if the bricks of C are pairwise 
distinct. A brick (x; i; x; j) is linked via a convex chain of length m 
to the brick (x0; i0; x0; j0), if there is a convex chain C of length m 
as above such that m >= 1; (x; i; x; j) = (x1; i1; x1; i2), and (x0; i0; 
x0; j0) = (xm; im; xm; im+1):

Remark 4.16 If C = (b1; : : : ; bm) is a convex chain, then its dual C = 
(bm; : : : ; b1) and (bp; : : : ; bq), 1 <= p <= q <= m are convex 
chains. If bp = (xp; ip; xp; ip) for some 1 < p < m, then (b1; : : : ; 
bp?1; bp+1; : : : bm) is a convex chain. If bp = bq for some 1 < p < q 
<= m, then (b1; : : : ; bp?1; bq; : : : bm) is also a convex chain. In 
particular, if two bricks are linked via a convex chain, then they are 
linked via some clean convex chain. The shortest chain linking two 
bricks to each other is certainly clean.

A typical picture of a convex chain is depicted in the following figure.

x1 i1
x1 i2
x2 i2
x2 i3
x3 i3
x3 i4
x4 i4
x4 i5
...

xq?1 iq?1
xq?1 iq
xq iq
xq iq+1
xq+1 iq+1
xq+1 iq+2
...

xm im
xm im+1

Definition 4.17 Let F ? I be a subset. A brick (x; i; x; j) 2 B is 
called a basis or foundation with respect to F , if j 2 F . We say that 
B satisfies the convex chain condition (with respect to F ), if every 
brick b 2 B can be linked via some convex chain to some basis. The set F 
is also called the set of final indices.

21


Note that if in the figure above we have fi2; : : : ; im+1g\F 6= ;, then 
the brick (x1; i1; x1; i2) is linked via a convex chain with a basis.

Lemma 4.18 Let n; m; f 2 N and B = ((?;?); (I; <=; 'I ); left; B) be a 
system of boundary equations with j?j = 2n. Let F ? I have size f . 
Suppose that every brick b 2 B can be linked via a convex chain of 
length at most m to a basis with respect to F . Then we can bound the 
size of B by

jBj <= (2n)m ? 2f:

Proof . Consider a convex chain of length k, k <= m, where the last 
brick is a basis:

C = ((x1; i1; x1; i2); (x2; i2; x2; i3); : : : ; (xk; ik; xk; ik+1))

Given any pair x 2 ?; j 2 I, there exists at most one brick (x; i; x; j) 
2 B. Therefore the whole chain is uniquely defined by the sequence (x1; 
x2; : : : ; xk; ik+1) 2 ?k ? F . The number of these chains is bounded 
by j?jk ? jF j. Finally, observe that every brick b 2 B can be linked 
via a convex chain either of length m ? 1 or of length m to some basis. 
Indeed, if (b; b2; : : : ; bk?1; bk) is a convex chain of length k, then 
(b; b2; : : : ; bk?1; bk; bk; bk) is a convex chain of length k + 2. 
Therefore we obtain

jBj <= j?jm?1 ? jF j + j?jm ? jF j <= (2n)m ? 2f:

2

Remark 4.19 Every system of boundary equations B satisfies the convex 
chain condition with respect to the set I, trivially. Furthermore, if we 
construct B by starting from a word equation x1 ? ? ? xg = xg+1 ? ? ? 
xd, 1 <= g < d, then we have jIj <= d. The transformation rules below 
will neither increase the number 2n of variables nor the sum 2n+f . It 
will increase the sizes of I and of B. However, Lem. 4.18 says that a 
large number of boundary equations (i.e., a large set of bricks) yields 
that there are long convex chains in order to satisfy the convex chain 
condition (pictorially: many bricks build skyscrapers). The next step is 
to show that long convex chains lead to high domino towers (pictorially: 
skyscrapers hide high towers) and hence to a lower bound on the exponent 
of periodicity in any solution.

Proposition 4.20 Let n; m 2 N and B = ((?;?); (I; <=; 'I ); left; B) be 
a solvable system of boundary equations with j?j = 2n. Let w j= (I; <=; 
'I) be a solution of B. Suppose that B contains a clean convex chain of 
length at least m. Then we have the following lower bound for the 
exponent of periodicity of the solution w:

m <= 4n2 ? (exp(w) + 1) ? 1

Proof . The hypothesis of the proposition implies n 6= 0, hence w 6= 1. 
Then the assertion becomes trivial for m < 8n2. Hence let n >= 1 and
?m+1
4n2
?
>= 2:
Since w is a solution we may assume that I is a subset of positions of w 
and it holds that 'I(`; r) = '(w(`; r)) for all `; r 2 I, ` <= r. For 
all x 2 ? define a word w(x) 2 A? by

w(x) = w(left(x); right(x)):

22


This permits also a notion of w-length for x 2 ?. We define

jxjw = jw(x)j:

Note that w(x) = w(x) and hence jxjw = jxjw for all x 2 ?. Let C = (b1; 
: : : ; bm) be a clean convex chain of length m, where bp = (xip ; ip; 
xip ; ip+1) for all 1 <= p <= m. Define m0 = ?m+1
2
?
,
then by duality (replacing C by C) we may assume:

left(x1) >= left(x2); left(x2) >= left(x3); : : : ; left(xm0?1) >= 
left(xm0):

The upper part of the chain C up to m0 might look like in in the 
following figure, where e.g. m = 11.

x1 i1
x1 i2
x2 i2
x2 i3
x3 i3
x3 i4
x4 i4
x4 i5
x5 i5
x5 i6
x6 i6
x6 i7

In the following we need a long chain where the label of the last brick 
has minimal wlength. In order to find such a chain we scan (b1; : : : ; 
bm0) from right to left. We find a sequence of indices

0 = p0 < p1 < ? ? ? < pk?1 < pk = m0

such that k <= n and for all q; j where pj?1 < q <= pj; 1 <= j <= k we 
have:

jxqjw >= jxpj jw:

This means that in each interval [pj?1 + 1; pj] the last label xpj has 
minimal w-length. By the pigeon hole principle there is at least one 
index j 2 f1; : : : ; kg such that

pj ? pj?1 >=
m+ 1
2n
:

We conclude that (after renaming) there is a clean convex chain C = (b1; 
: : : ; b`) satisfying the following properties:

` =
?m+1
2n
?
;
left(xp) >= left(xp+1) for 1 <= p < `;
jxpjw >= jx`jw for 1 <= p <= `:

23


Next define h (which will become the hight of a domino tower) by h =
?m+1
4n2
?
. Then it holds
h >= 2 and

2n(h ? 1) + 1 <=

?
m+ 1
2n

ß

:

Hence there is some index p; 1 <= p < ` such that the label xp occurs at 
least h times in the the clean convex chain C. We may assume that this 
is the first label x1 and still ` >= h. Hence, there is a clean convex 
chain C = (b1; : : : ; b`), which satisfies the following properties:

` >= h;
left(xp) >= left(xp+1) for 1 <= p < `;
jxpjw >= jx`jw for 1 <= p <= `;
the label x1 occurs exactly h times.

This is the point where we switch from the chain to the sequence of 
words:

(w(x1); : : : ; w(x`)):

We obtain a tower of words where w(x`) has minimal length and the word 
w(x1) occurs at least h times.

w(x6)

w(x5)
w(x4)

w(x3)

w(x2)
w(x1)

Define vp 2 A? to be the prefix of w(xp) of length jw(x`)j and let up = 
w(left(xp); ip) for 1 <= p <= `. Since jupj <= jw(left(x`); i`)j <= jv`j 
= jvpj, the word up is a prefix of vp for all 1 <= p <= `. The sequence 
(v1; : : : ; v`) can be arranged in a tower of words which is already in 
better shape: All words vp have equal length.

v6

v5
v4

v3
v2

v1

The vertical line corresponds to the factorization vp = upu0
p for 1 <= p <= `.
Finally, let fq1; q2; : : : ; qhg be a set of the h indices where the 
bricks have label x1. Since the convex chain leading to this tower is 
clean, we see that uqj 6= uqk for all 1 <= j; k <= h; j 6= k. (This is 
the only point where it is used that the chain is clean!) We obtain:

0 <= juq1 j < juq2 j < ? ? ? < juqh j:

24


Moreover, we have v1 = vq1 = vq2 = ? ? ? = vqh . We omit all other words 
in the tower above and we see that the word v1 can be arranged in a 
domino tower of height h and h >= 2. Applying Lem. 2.5 we obtain h ? 1 
<= exp(w1) <= exp(w). Since
?m+1
4n2
?
<= h, the assertion of
the proposition follows. 2

Corollary 4.21 Let B = ((?;?); (I; <=; 'I ); left; B) denote a solvable 
system of boundary equations which satisfies the convex chain condition 
with respect to some subset F ? I. Then we have

jBj <= j?jj?j2(exp(B)+1)?1 ? 2jF j:

If moreover j?j; jF j 2 O(d), and exp(B) 2 2O(d+log c(S)), then we have

jBj 2 22O(d+log c(S))
:

Proof . Let 2n = j?j, f = jF j, and m be the maximal length of a clean 
convex chain in B. By Lem. 4.18 and Rem 4.16 we have

jBj <= (2n)m ? 2f:

Choose a solution w such that exp(w) <= exp(B). Prop. 4.20 yields a 
lower bound for the exponent of periodicity for all solutions. Hence:

m <= 4n2 ? (exp(w) + 1) ? 1:

Putting things together we obtain:

jBj <= (2n)4n2?(exp(w)+1)?1 ? 2f <= (2n)4n2?(exp(B)+1)?1 ? 2f:

The result follows. 2

4.4 Transformation rules

We are ready to define the (non-deterministic) transformation rules of 
Makanin's algorithm. If we apply a rule to a system B = ((?;?); (I; <=; 
'I ); left; B), then the new system is denoted by B0 = ((?0;?); (I 0 ; 
<=; 'I0); left0; B0). The transformation rules below will have the 
property that if B = ((?;?); (I; <=; 'I ); left; B) satisfies the convex 
chain condition with respect to some subset F ? I, then B0 satisfies the 
convex chain condition with respect to some subset F 0 ? I 0 such that 
j?0j + jF 0j <= j?j + jF j. Thus, if we start with a system B0 where 
j?0j = 2n0 and jI0j <= d, then throughout the whole procedure the size 
of the set of final indices is smaller than or equal to 2n0 + d.

We say that a (non-deterministic) rule is downward correct, if w 2 A? is 
a solution of B, then (for at least one non-deterministic choice) some 
suffix w0 of w is a solution of B0, and moreover either j?0j < j?j or 
jw0j < jwj. Thus, applied to solvable systems at least one sequence of 
choices of downward correct rules leads to termination.

We say that a (non-deterministic) rule is upward correct, if w0 2 A? is 
a solution of B0 (and B0 is the result of any non-deterministic choice), 
then there is word w 2 A? which is a solution of B.

25


Rule 1 If there is some x 2 ? with left(x) = right(x), then cancel both 
bricks

(x; right(x); x; right(x)) and (x; right(x); x; right(x))

from B. Cancel x and x from ?.

Remark 4.22 Obviously Rule 1 is upward and downward correct since we 
have w(i; i) = 1 for all words w and all positions i of w. Hence the set 
of solutions is the same. In order to preserve the convex chain 
condition we introduce two new final indices. Let x 2 ? such that 
left(x) = right(x) and assume that x; x are canceled by Rule 1. Define F 
0 = F [ fleft(x); left(x)g. Consider a convex chain C = (b1; : : : ; bm) 
where for some 1 < p <= m the brick bp has the form bp = (x; right(x); 
x; right(x)). Hence the brick bp is canceled. However, the brick b1 is 
linked to bp?1 via a convex chain and bp?1 is now a basis since right(x) 
= left(x) 2 F 0. Thus, if B satisfies the convex chain condition with 
respect to F , then the system B0 (after an application of Rule 1) 
satisfies the convex chain condition with respect to F 0. We have j?0j + 
jF 0j <= j?j + jF j.

Rule 2 If there exists some x 2 ? with left(x) = left(x), then cancel 
all bricks (x; j; x; j) and (x; j; x; j) from B. Cancel x and x from ?.

Remark 4.23 Recall that for (x; i; x; j) 2 B we have left(x) = left(x) 
if and only if i = j. Thus, if left(x) = left(x), then all bricks with 
label x have the form (x; j; x; j). Again, Rule 2 is obviously upward 
and downward correct. For the convex chain condition consider a convex 
chain C = (b1; : : : ; bm) where bp = (x; j; x; j) for some 1 < p <= m. 
If we have p < m, then C 0 = (b1; : : : ; bp?1; bp+1; : : : ; bm) is a 
shorter convex chain linking b1 with a basis. For p = m we have j 2 F . 
Hence bm?1 is also a basis.

Rule 3 Let ` = min(I). If ` 62 left(?), then cancel the index ` from I. 
This means we replace the linear order over S by the induced sub-order 
(I 0; <=; 'I0) where I 0 = I n f`g.

Remark 4.24 Clearly, the convex chain condition is not affected by this 
rule. Downward correctness is obvious, too. To see the upward 
correctness let (I; <=; 'I ) be given by the S- sequence (s1; : : : ; 
sm) and let w0 2 A? be a solution of the new system after an application 
of Rule 3 such that min(I 0) is the first position of w0. By definition 
of an S-sequence there is a non-empty word u 2 A+ with '(u) = s1. Then 
the first position of w0 is not equal to the first position in the word 
uw0, and uw0 is a solution of B. For later use notice that we can choose 
u such that juj <= jSj.

The next rule is very complex. It is the heart of the algorithm. Before 
we apply it to some system B = ((?;?); (I; <=; 'I ); left; B), we apply 
Rules 1, 2 or 3 as often as possible. In particular, we shall assume 
that left(x) < right(x), left(x) 6= left(x) for all x 2 ?, and that 
there exists some x 2 ? with left(x) = min(I).

Rule 4 We divide Rule 4 into six steps.

We need some notation. Define ` = min(I) and r = maxfright(x) j x 2 ?; 
left(x) = `g: Note that ` 2 left(?), hence r 2 I exists and we have ` < 
r. Choose (and fix) some xo 2 ?

26


with left(xo) = ` and right(xo) = r. Define `? = left(xo) and r? = 
right(xo). Define the critical boundary c 2 I by c = minfc0; rg where

c0 = minfleft(x) j x 2 ?; r < right(x)g:

Note that since r < r? = right(xo), the minimum c0 and hence the 
critical boundary c exists. We have ` < c <= r < r? and c <= `? < r?, 
but the ordering of r and `? depends on the system. Define the subset T 
? I of transport positions by

T = fi 2 I j i <= cg [ fi 2 I j 9(x; i; x; j) 2 B : left(x) < cg

Note that min(T ) = ` and that i 2 T for all (xo; i; xo; j) 2 B. 
Moreover, since left(x) < c implies right(x) <= r, we have max(T ) = r.

Step 1 Choose some admissible extension (I?; <=; 'I?) of (I; <=; 'I ) by 
T at [`?; r?]. By convention we identify I as a subset of I?, whence I ? 
I?, and there is a subset T ? ? I? with min(T ?) = `?, max(T ?) = r?, 
and which is in order respecting bijection with T . For each i 2 T the 
corresponding position in T ? is denoted i?. Having these notations we 
put a further restriction on the admissible extension: We consider only 
those admissible extensions where first, i < i? for all i 2 T and 
second:

left(x)? <= left(x) , i? <= j;

left(x)? >= left(x) , i? >= j

for all (x; i; x; j) 2 B with left(x) < c. Note that for all (xo; i; xo; 
j) 2 B this implies i? = j. If such an admissible extension is not 
possible, then Step 1 cannot be completed and Rule 4 is not applicable.

Step 2 Introduce new variables x? and x? and define left(x?) = c, 
left(x?) = c?. For all i 2 T such that there is some (x; i; x; j) 2 B 
with left(x) < c <= i introduce new bricks (x? ; i; x? ; i?) and (x? ; 
i?; x? ; i).

Step 3 As long as there is a variable x 2 ? with left(x) < c, replace 
left(x) by left0(x) = left(x)? and replace all bricks (x; i; x; j); (x; 
j; x; i) 2 B by (x; i?; x; j) and (x; j; x; i?).

Remark 4.25 To have some notation let x denote a variable before Step 3 
and let x0 be the corresponding variable after Step 3. Likewise let b = 
(x; i; x; j) denote a brick before Step 3 and let b0 = (x0; i0; x0; j) 
be the corresponding brick after Step 3. If left(x) = left0(x0), then 
sometimes we may still write x = x0. In particular, x? = x0
?, x? = x? 0, xo = xo0, but xo 6= xo0.
For b = (x; i; x; j) and b0 = (x0; i0; x0; j0) there are four cases:
b0 = (x0; i?; x0; j?) if left(x) < c; left(x) < c;
b0 = (x0; i?; x; j) if left(x) < c; c <= left(x);
b0 = (x; i; x0; j?) if c <= left(x); left(x) < c;
b0 = (x; i; x; j) if c <= left(x); c <= left(x):
Note that after Step 3 all bricks (xo; i; xo; j) 2 B have the form (x0
o; i?; xo; i?).

27


Step 4 Define as the new set of final indices

F 0 = fi? 2 I? j i < c and i 2 Fg [ fi 2 F j c <= ig:

Step 5 Cancel all bricks with label x0
o or xo, i.e., cancel all bricks of the form (x0
o; i?; xo; i?)
or (xo0; i?; x0
o; i?). Then cancel the variables xo, xo.

Step 6 Replace I? by I 0 = fi 2 I? j c <= ig and consider the linear 
order (I 0; <=; 'I0) induced by I 0 ? I?.

After Step 6 the transformation rule is finished. The new system is 
denoted by B0 = ((?0;?); (I 0; <=; 'I0); left0; B0). We will show from 
Lem. 4.30 to 4.33 below that B0 satisfies the convex chain condition 
with respect to F 0. The first lemma is a trivial observation.

Lemma 4.26 We have j?0j = j?j and jF 0j <= jF j.

Proof . In Step 2 new variables x? and x? are introduced, but in Step 5 
the variables x0 0
and x0 are canceled. Hence j?0j = j?j. The set of final indices is 
changed in Step 4. However, the assertion jF 0j <= jF j is clear by the 
definition of F 0. 2

The following lemma is crucial to bound the size of I during the 
transformation procedure. The lemma has a rather subtle proof.

Lemma 4.27 Let fi0 = jf(x0; i0; x0; j0) 2 B0 j left0(x0) < i0gj and fi = 
jf(x; i; x; j) 2 B j left(x) < igj. Then we have

2jI 0j ? fi0 <= 2jIj ? fi:

Proof . The inequality can be destroyed either by a new position i? 2 T 
? n I or by the cancelation of bricks (x0 o; i?; xo; i?), (xo; i?; x0
o; i?) in Step 5, where `? < i?. (Recall the definition
of fi and fi0 and that left(xo) = `, left0(x0
o) = `?.) The cancelation of these bricks involves
again a position of type i? 2 T ?. Fortunately, if (x0
o; i?; xo; i?) is canceled, where `? < i?, then
i? = j for some j 2 I n f`g. In particular, i? is not a new position and 
the two cases don't occur simultaneously. Therefore it is enough to find 
for each i? 2 T ? n f`?g either two new bricks which are introduced in 
Step 2 or one position which is canceled in Step 6. Then the total 
balance will be negative or zero.

Let us consider the positions of type i? 2 T ? n f`?g one by one. If c? 
< i?, then by the definition of T and Step 2 there are two new bricks 
(x? ; i; x? ; i?), (x? ; i?; x? ; i) 2 B0 and we have left(x?) < i, 
left(x?) < i?. Next consider i? = c?. At least one position (namely `) 
is canceled in Step 6. Next let `? < i? < c?, i.e., ` < i < c. The 
position i is canceled in Step 6. Hence we have the assertion of the 
lemma. 2

Lemma 4.28 Rule 4 is downward correct.

Proof . Let w 2 A? be a solution of B. Since w j= (I; <=; 'I ), we can 
view I as a subset of positions of w with ` = 0. Let w = vw0 where v = 
w(`; c). The word v is a non-empty prefix of w(`; r). The word w(`; r) 
is a prefix of w and at the same time another factor of w0; we

28


have w(`; r) = w(`?; r?) with ` < `? due to the brick (xo; r; xo; r?) 2 
B. The set T is a subset of positions of w(`; r), hence we find a 
corresponding subset T ? of positions of w(`?; r?). The union I [ T ? 
leads to an admissible extension (I?; <=; 'I) such that first, i < i? 
for all i 2 T and second, w(j; k) = w(j?; k?) for all j; k 2 T ; j <= k. 
A careful but easy inspection of Rule 4 then shows that w0 j= (I 0; <=; 
'I0) and w0 is a solution of B0. 2

Lemma 4.29 Rule 4 is upward correct.

Proof . Let w0 2 A? be a solution of B0. Since w0 j= (I 0; <=; '0
I ), we can view I 0 as a subset
of positions of w0 where c is the first position of w0. Define v = 
w0(l?; c?) and let w = vw0. Then we have w j= (I?; <=; 'I?) such that v 
= w(l; c) = w(l?; c?). With the help of the bricks (x? ; i; x? ; i?) we 
conclude that w(j; k) = w(j?; k?) for all j; k 2 T ; j <= k. Therefore 
we have w(left(x); i) = w(left(x); j) for all (x; i; x; j) 2 B. Since I 
? I?, we have w j= (I; <=; 'I) and w is a solution of B. 2

Finally we show that Rule 4 preserves the convex condition. This is 
clear for Step 1, for the other steps we state lemmata.

Lemma 4.30 Step 2 preserves the convex chain condition with respect to 
the set F .

Proof . The new bricks in Step 2 have the form (x? ; i; x? ; i?) and (x? 
; i?; x? ; i) for some (x; i; x; j) 2 B with left(x) < c = left(x?) <= 
i. Since (x; i; x; j) 2 B can be linked via a convex chain to some 
basis, it is enough to consider the following figure:

x? i
x? i?
x? i?
x? i
x i
x j

2

Lemma 4.31 Let C = (b1; : : : ; bm) be a convex chain before Step 3 
linking b1 with bm. Then after Step 3 there is a convex chain C 0 
linking b0
1 with b0
m.

Proof . Let us have a local look at the convex chain:

C = ( : : : ; (x; i; x; j); (y; j; y; k) : : : ):

By symmetry we may assume that left(x) >= left(y). Pictorially this 
local part is then given by the following figure.

...

x i
x j
y j

y k
...

29


This is the situation before Step 3. After Step 3 let us denote the 
corresponding bricks by (x0; i0; x0; j0) and (y0; j00; y0; k0). This 
yields the following figure.

...

x0 i0
x0 j0

...

y0 j00
y0 k0

...

The question is whether or not j0 = j00. If j0 = j? or j00 = j, then we 
have j0 = j00, and the chain is not broken. Hence we have to consider 
the case j0 = j and j00 = j?, only. This case is equivalent to

left(y) < c <= left(x) <= j:

With the help of the brick (x? ; j; x? ; j?), which was introduced in 
Step 2, we can repair the broken chain. We have

left(x?) = c <= left(x); left0(y0) < c? = left(x?)

and we obtain the following figure:

...

x0 i0

x j
x? j
x? j?
y0 j?
y0 k0

...

Doing this transformation wherever necessary we construct the convex 
chain C 0. 2

Note that C 0 constructed in the lemma above may contain many bricks of 
the form (x0
o; i?; xo; i?) and (xo; i?; x0
o; i?): These bricks were canceled only later in Step 5. In fact
their presence in the next lemma is very useful again.

Lemma 4.32 After Step 4 the convex chain condition is satisfied with 
respect to the set F 0.

Proof . Let b0 be a brick after Step 3 and b the corresponding brick 
before Step 3. This brick b is linked before Step 3 via a convex chain 
to some basis (x; i; x; j) with j 2 F . Lem. 4.31 states that after Step 
3 the brick b0 is linked via a convex chain to the corresponding brick 
(x0; i0; x0; j0). For j < c we have left(x) < c and j0 = j? 2 F 0. Hence 
(x0; i0; x0; j?) is again a basis. For j0 = j we have c <= j and 
therefore j 2 F 0. This solves also the case j0 = j. The remaining case 
is c <= j and j0 = j?. This means left(x) < c <= j. By Step 2 there is a 
brick (x? ; j?; x? ; j) and we have left0(x0) < c? = left(x?). We may 
put the brick (x0; i0; x0; j?) upon the basis (x? ; j?; x? ; j). Since j 
2 F \ F 0, it is in fact a basis before and after Step 4. We obtain the 
following figure:

30


x0 i0
x0 j?

x? j?
x? j

2

Lemma 4.33 Steps 5 and 6 preserve the convex chain condition with 
respect to the set F 0.

Proof . Step 5 is a special case of an application of Rule 2, likewise 
Step 6 is a special case of applications of Rule 3. In particular, the 
convex chain condition is preserved. 2

The lemmata above yield to the following proposition:

Proposition 4.34 Rule 4 is upward and downward correct. It preserves the 
convex chain condition.

Example 4.35 Let x1 ? ? ? xg = xg+1 ? ? ? xd be a word equation, 1 <= g 
< d such that the regular constraints are given by a mapping : ? ! S. 
Let

B = ((?;?); (I; <=; 'I ); left; B)

be the result of the (logspace-) reduction presented in Sect. 4. Recall 
that (I; <=; 'I ) represents the S-sequence

( (x1); : : : ; (xg); (xg+1); : : : ; (xd)):

We may assume that (I; <=; 'I) is in its standard representation, I = 
f0; : : : ; dg. According to the reduction the set ? contains two 
variables x0 and x0 such that left(x0) = 0, right(x0) = g = left(x0), 
and right(x0) = d. The set B contains at most d boundary equations (or 
bricks), among them there is the brick:

x0 g

x0 d

We have jIj = d + 1 and j?j = jBj <= 2d. If the word equation has a 
non-singular solution satisfying the regular constraints, then exp(B) <= 
2 ? e(c(S); d).
Rules 1 to 3 are not applicable to B, but we can try Rule 4. Doing this 
we find:

xo = x0; l = 0; c = g = r = l?; and c? = g? = r? = d:

The set T of transport positions is T = f0; : : : ; gg.

In Step 1 we have to choose some admissible extension of (I; <=; 'I) by 
T at [g; d]. In general it is not clear that such an extension exists. 
Under the hypothesis that x1 ? ? ? xg = xg+1 ? ? ? xd has a non-singular 
solution ö : ? ! A+ with ' ffi ö = we can continue. Let v = ö(x1 ? ? ? 
xg) and assume that v has minimal length among all solutions satisfying 
the regular constraints given by . With the help of this word Step 1 can 
be completed: Define w = vv, then we have

w j= ( (x1); : : : ; (xd)):

31


The set of positions of w is f0; : : : ; m;m + 1; : : : ; 2mg where m = 
jvj. The fact that w is a model of (I; <=; 'I) is realized by an order 
respecting injective mapping

? : f0; : : : ; dg ! f0; : : : ; 2mg:

Define T ? = fm+?(i) j 0 <= i <= gg and I? = ?(I) [T ?. Since I? is a 
subset of positions of w, this induces a linear suborder over S, which 
is denoted by (I?; <=; 'I?). We have jI?j <= d+g?1. After renaming we 
may assume I? = f0; : : : ; dg[T ?, T ? = f0?; : : : ; g?g such that 0? 
= c = g, c? = g? = d. This completes Step 1 of Rule 4. Since in reality 
we usually do not know v, the choice of I? is a non-deterministic guess!
The next steps in Rule 4 are deterministic. In Step 2 we introduce new 
variables x? and x? with left(x?) = g = right(x?) and left(x?) = d = 
right(x?).

In Step 3 we transport the structure of the interval [0; g] to [0?; g?] 
= [g; d]. If we still view I? as a subset of positions of w, then this 
reflects a transport to the positions from the first to the second 
factor v in the word w = vv.
The definition of F 0 according to Step 4 is

F 0 = fi 2 I? j g <= ig:

In Step 5 we cancel the bricks (xo; d; xo; d), (xo; d; xo; d) and the 
variables xo, xo. In Step 6 we replace I? by I 0 = F 0.
Rule 4 is finished. The cardinality of I 0 is bounded by d. Let B0 
denote the new system, then the word v is a solution, v j= (I 0; <=; '(I 
0).

Since in the present situation left(x?) = right(x?) = g, Rule 1 is now 
applicable to B0, it cancels the superfluous bricks (x? ; g; x? ; d), 
(x? ; d; x? ; g) and the variables x? and x? . The new system after an 
application of Rule 1 is denoted by B00 = ((?00
0 ; ?); (I 00
0 ; <=; 'I000 ); left00
0; B00
0 ):
We have jI 00j <= d; j?00j = jB00j <= 2(d ? 1). It is now the word v 
which is a solution of B00, hence exp(B00) <= exp(v). Therefore, we can 
choose e(B00) = e(c(S); d).

5 Proof of Theorem 4.12

5.1 Decidability

The proof of Thm. 4.12 is a reduction to a reachability problem in some 
finite directed graph. The implications for space- and time bounds for 
Makanin's algorithm are given later. The instance is a system of 
boundary equations

B0 = ((?0;?); (I0; <=; 'I0); left0; B0):

We may assume that B0 satisfies the assumptions made at the beginning of 
Sect. 4.3, because otherwise B0 is not solvable. For trivial reasons the 
system B0 satisfies the convex chain condition with respect to the set 
F0 = I0:

Let 2n0 = j?0j and f0 = jF0j = jI0j. According to Prop. 4.13 choose a 
number e(B0) such that either B0 is not solvable or exp(w) <= e(B0) for 
some solution w of B0. Define an integer fimax by

fimax = (2n0)4n20(e(B0)+1)?1 ? 2(2n0 + f0):

Note that this value is defined just to fit Cor. 4.21 for a set of final 
indices having size at most 2n0 + f0.

32


Now, define a directed graph G (the search graph of Makanin's algorithm) 
as follows. The nodes of the search graph G are the systems of boundary 
equations B = ((?;?); (I; <= ; 'I); left; B), where:

j?j <= 2n0;

jIj <=
n0 + 2
2
? fimax;

jBj <= fimax:

For systems B,B0 2 G we define an arc from B to B0 whenever first, there 
is a transformation rule is applicable to B and second, B0 is the result 
of the corresponding transformation. A system B 2 G with an empty set of 
variables is called a terminal node.

Clearly, B0 2 G and the search graph G has only finitely many nodes. 
Hence, it is enough to show the following claim: The system B0 has a 
solution if and only if there is a directed path in G from B0 to some 
terminal node.

The "if"-direction of the claim is trivial since all transformation 
rules are upward correct and since all terminal nodes are solvable by 
Lem. 4.6. For the "only-if"-direction let B0 be solvable and let w0 j= 
(I0; <=; 'I0) be a solution satisfying exp (w0) <= exp (B0).

Let M >= 0 and assume that there is an inductively defined sequence of 
solvable systems (B0; B1; : : : ; BM ), M >= 0 such that the following 
properties are satisfied for all 1 <= k <= M :

- Bk = ((?k;?); (Ik; <=; 'Ik); leftk; Bk) is the result of some 
transformation rule applied to Bk?1,

- Bk has a solution wk j= (Ik; <=; 'Ik) such that wk is a suffix of 
wk?1,

- either j?kj < j?k?1j or jwk?1j < jwkj,

- Bk satisfies the convex chain condition with respect to some subset Fk 
? Ik with jFkj <= 2n0 ? j?kj + f0.

If BM is a system of boundary equations without variables, then we stop. 
Otherwise, since BM is solvable, a transformation rule is applicable. 
Consequently, the sequence can be continued by some solvable system BM+1 
satisfying all properties above. The third property however implies that 
M <= n0 + jw0j. Hence, finally we must reach a system without variables. 
We may assume that this happens with reaching BM . Let us show that all 
Bk are nodes of G for all 0 <= k <= M . This will imply the claim since 
then there is a directed path to BM , and BM is a terminal node.
We have to verify j?kj <= 2n0; jIkj <= n0+2
2 ? fimax; and jBkj <= fimax:
The assertion j?kj <= 2n0 is trivial. The second property of the 
sequence implies exp (Bk) <= exp (wk) <= exp (w0) <= e(B0). By Cor. 4.21 
and the fourth property we have jBkj <= fimax. The next lemma yields an 
invariant which will give the desired bound on the size of every Ik.

Lemma 5.1 For 0 <= k <= M define fik = jf(x; i; x; j) 2 Bk j leftk(x) < 
igj. Then for all 1 <= k <= M we have:

2 jIkj ? fik +
j?kj
2
? fimax <= 2 jIk?1j ? fik?1 +
j?k?1j
2
? fimax:

33


Proof . Consider the rule which was applied to pass from Bk?1 to Bk. For 
Rule 1 or 2 we have:

j?kj = j?k?1j ? 2;

jIkj = jIk?1j ;

fik?1 ? fik <= fimax:

For Rule 3 we have:

j?kj = j?k?1j ;

jIkj = jIk?1j ? 1;

jfikj = jfik?1j :

Finally, for Rule 4 we have j?kj = j?k?1j and Lem. 4.27 says:

2 jIkj ? fik <= 2 jIk?1j ? fik?1:

The assertion of the lemma follows. 2

A consequence of Lem. 5.1 (and fik <= fimax) is:

2 jIkj <= 2 jI0j + (n0 + 1)fimax for all 0 <= k <= M:

Since jI0j <= 1
2fimax, we obtain jIkj <= n0+2
2 fimax. Hence Bk 2 G for all 0 <= k <= M . This proves
Thm. 4.12, hence Makanin's result.

5.2 Complexity in terms of the semigroup S and the maximal number of 
boundary equations

Our estimations on the upper bounds of Makanin's algorithm are given by 
the size of the semigroup S and the number fimax as defined in the 
precedent section, which is the maximal number of boundary equations.

A node B = ((?;?); (I; <=; 'I ); left; B) of the search graph G is 
encoded as a binary string over f0; 1g as follows: The code for (?;?) is 
simply the number n written in binary such that j?j = 2n. Thus, O (log 
n0) bits are enough for this part. The linear order (I; <=; 'I ) is 
encoded by its underlying S-sequence. For this part O(n0fimax log jSj) 
bits are used. The mapping left : ? ! I is encoded by using O(n0 log 
(n0fimax)) bits. Finally, the set of bricks B can be encoded by using 
O(fimax log (n0fimax)) bits. Note that n0 <= log fimax. It follows that 
there is effectively a constant c 2 N such that every B 2 G can be 
described by a bit string of length equal to c ? (log jSj ? fimax ? log 
(fimax)). Up to some calculations performed over S this is the essential 
upper space bound for the non-deterministic procedure. It is at most 
double exponential in the input size, but we will come back to this 
point later. The number of bits we need for the code yields an upper 
bound for the size of G. Using the constant c above, define a natural 
number flmax by:

flmax = 2c?(logjSj?fimax?log(fimax)) 2 2O(logjSj?fimax?log(fimax)):

Lemma 5.2 The number of nodes in G is less than or equal to flmax:

34


Proof . The number of nodes is as most exponential in the number of bits 
used in a
description for a node. 2

The following assertion is now clear:

Proposition 5.3 The system B0 is solvable if and only if G contains a 
directed path

(B0; : : : ; Bm)

to the terminal node Bm = (;; ;; ;; ;) such that m <= flmax:

Proof . If there is a path to some terminal node then this path can be 
elongated by applications of Rule 3 until finally the underlying linear 
order is the empty set. We may
assume that this path is without cycles, then Lem. 5.2 implies m <= 
flmax: 2

Corollary 5.4 Let B0 be solvable and let w0 j= (I0; <=; 'I0) be a 
solution, where the length jw0j is minimal. Then we have

jw0j <= jSj ? 2flmax :

Proof . Consider a sequence (B0; : : : ; Bm), m <= flmax to the terminal 
node Bm = (;; ;; ;; ;) as in Prop. 5.3. Going the path backwards we 
define inductively solutions vm; vm?1; : : : ; v0 of the systems 
Bm;Bm?1; : : : ; B0 as follows. The initial solution is vm = 1. Assume 
that vm; : : : ; vk, 1 <= k <= m are already defined. Depending on the 
transformation rule which link Bk?1 to Bk we define the solution vk?1 j=
?
Ik?1; <=; 'Ik?1
?
of the system Bk?1.
For Rule 1 or 2 we define vk?1 = vk. For Rule 3 we define vk?1 = uvk for 
some suitable u 2 A+. It is clear that we can choose u such that juj <= 
jSj. Hence jvk?1j <= jSj + jvkj. For Rule 4 we find a solution vk?1 = 
uvk, where u is a factor of vk. Hence jvk?1j <= 2 jvkj. We end up with a 
solution v0 j= (I0; <=; 'I0) of B0 such that jv0j <= jSj ? 2flmax .
Since w0 is of minimal length we have jw0j <= jv0j. 2

Remark 5.5 Assume that B0 is solvable and w0 is a solution of minimal 
length. The word w0 can be used as a model for constructing a path

(B0; : : : ; BM ) ; M >= 1

to some terminal node. However, we cannot exclude that this path has 
many cycles. In particular, M need not to be the number m, which was 
used in the proof of Cor. 5.4. Due to the construction of the solution 
v0 above, it is possible that v0 6= w0 and e(B0) < exp(v0).

Corollary 5.6 In deterministic time

2O(jSj?2flmax )

we find by exhaustive search either a solution of minimal length or we 
can report that B0 is not solvable. This upper bound is five times 
exponential in the input size (number of bits used in the encoding) of 
B0.

Proof . Test all strings up to the length of jSj ? 2flmax whether they 
are a solution. Stop when the first solution is encountered. If no 
solution up to this length is found, then B0 is
not solvable. 2

35


5.3 An upper bound for the complexity of solving word equations

The original question of the chapter is whether a given word equation x1 
? ? ? xg = xg+1 ? ? ? xd, 1 <= g < d with regular constraints has a 
solution. We may assume that each regular language Lx ? A? is specified 
by an NFA with rx states, x 2 ?. Define r =
P
x2? rx; we are going to
measure the complexity of Makanin's algorithm in terms of d and r. 
First, we choose a suitable semigroup S and a homomorphism ' : A+ ! S. 
By Rem. 3.1 we may assume that S satisfies jSj <= 2r2
and c (S) <= r!. By Thm. 3.2 choose a value e (c (S) ; d) 2 c(S) ? 2O(d) 
? 2O(d+r log r) such that e (c (S) ; d) is an upper bound for the 
exponent of periodicity. Transform the word equation (by a 
non-deterministic guess) into a system of boundary equations

B0 = ((?0;?); (I0; <=; 'I0); left0; B0).

such that the word equation has a solution satisfying the regular 
constraints if and only if B0 is solvable. This is possible such that 
first, jI0j; j?0j; jB0j 2 O(d), and second, if B0 is solvable, then

e (B0) <= 2 ? e (c (S) ; d) 2 2O(d+r log r):

More precisely, by Ex. 4.35 we can say jI0j <= d ? 1; j?0j = jB0j <= 2(d 
? 1) and, if B0 is solvable, then e (B0) <= e (c (S) ; d) :

Compute a value fimax 2 22?(d+r log r)
such that the search graph G satisfies Prop. 5.3 for the
corresponding value flmax. Recall that fimax is an upper bound for the 
number of boundary equations of each node and that flmax an upper bound 
for the number of nodes in G. The number fimax is double exponential in 
the input size, which is for simplicity d + r. The value fimax is large 
enough to perform all computations over the semigroup S and it is small 
enough in order to solve the reachability problem in the search graph G 
in non-deterministic space NSPACE(22O(d+r log r)
). Using standard knowledge in complexity theory (like Savitch's
Theorem, c.f. Hopcroft and Ullman (1979)), we can state:

Theorem 5.7 The satisfiability problem for word equations with regular 
constraints is in the following complexity classes:

DSPACE
?
22O(d+r log r)?
, i.e., double exponential deterministic space,

DTIME

?
222O(d+r log r) ?
, i.e., triple exponential deterministic time.

The length of a shortest solution is at most four times exponential in 
the input size, it can be bounded by

2222O(d+r log r)
:

The computation of a shortest solution by exhaustive search is possible 
in at most five times deterministic exponential time.

Remark 5.8 The complexity bounds given above are slightly different from 
other bounds published in the literature so far. In Ko?scielski and 
Pacholski (1996: Cor. 4.6) a triple exponential non-deterministic time 
bound for the satisfiability problem is given. Here we have triple 
exponential deterministic time, since in the formulation as a graph 
reachability problem it is the

36


number of nodes which becomes important. The upper bound for the 
exponent of periodicity given in Schulz (1992a) for the situation 
including regular constraints is based on the techniques of the original 
article of Makanin. This yields a double exponential bound whereas it is 
shown here that one exponential is enough. The upper bound for the 
exponent of periodicity is essentially optimal; the optimality is not 
known for the bounds mentioned in Thm. 5.7

6 Notes

A systematic study of equations in free monoids was initiated by A. A. 
Markov in the late 1950's in connection with Hilbert's Tenth Problem, 
see Hmelevski>=? (1971), Makanin (1981). It is not difficult to see that 
the matrices having non-negative integer coefficients and determinant 1 
form a free monoid inside the special linear group SL2(Z). The free 
generators are:

a =

?
1 1
0 1

?

; b =

?
1 0
1 1

?

:

Let L = R be a word equation over fa; bg in unknowns ? = fx1; : : : ; 
xng. Replace each variable xi 2 ? by a matrix
?
ffi1 ffi2
ffi3 ffi4

?

;

where ffij denote variables over N. Multiplying matrices corresponding 
to the words L and R yields an equation of the form
?
P1 P2
P3 P4

?

=

?
Q1 Q2
Q3 Q4

?

:

The coefficients P1; : : : ; Q4 are polynomials in the ffij . It is 
clear that the equation L = R has a solution if and only if the 
following Diophantine system has a non-negative solution:

Pi = Qi; i = 1; : : : ; 4;
ffi1ffi4 ? ffi2ffi3 = 1; i = 1; : : : ; n:

The hope of Markov was to prove this way the unsolvability of Hilbert's 
Tenth Problem, which was not settled at that time. This failed: The 
unsolvability of Hilbert's Tenth Problem was shown in 1970 by 
Matiyasevich using quite different methods, see Matiyasevich (1993). The 
solvability of word equations is, needless to say at this place, due to 
Makanin (1977).

Before Makanin obtained the breakthrough only partial results were 
known. In 1964 and 1967 Hmelevski>=? found a positive solution for the 
cases with two and three variables respectively, see Hmelevski>=? 
(1971). In the case of two variables a polynomial time algorithm for the 
satisfiability problem is given in Charatonik and Pacholski (1993). The 
solvability in the case where each variable occurs at most twice is due 
to Matiyasevich (1968). Other special cases were solved in Plotkin 
(1972) and Lentin (1972). After the general solution was established in 
1977 other questions became central. In Makanin (1979) it is shown that 
the rank of an equation is computable, see also P?ecuchet (1981). 
Makanin's algorithm was implemented in 1987 at Rouen, see Abdulrab and 
P?ecuchet (1990). The inherent complexity of the satisfiability problem 
of word equations is not known. The lower bound is NP-hardness

37


for equations without regular constraints and PSPACE-hardness for 
equations with regular constraints. The known upper bounds to date for 
Makanin's algorithm are given in Thm. 5.7. There is a double exponential 
gap between lower and upper bound for the space complexity. The original 
article Makanin (1977) is very technical. In the sequel other 
presentations with various improvements were given, let us refer to 
Jaffar (1990), Schulz (1992a, 1993). The present chapter is along this 
line, it is rather close to Schulz (1992a). A brief survey on equations 
in words can be found in Perrin (1989); more material on equations in 
free monoids and, especially on equations without constants, is in the 
Handbook of Formal Languages, see Choffrut and Karhumäki (1997). There 
are also two volumes in the Springer lecture notes series dedicated to 
word equations and related topics: Schulz (1992b) and Abdulrab and 
P?ecuchet (1993).

Equations in free groups are defined analogously to word equations. The 
situation however becomes extremely complicated. It was Makanin himself 
who mastered also this problem. In Makanin (1982) and with a correction 
in Makanin (1984) it is shown that the satisfiability of group equations 
with constants in decidable. In Razborov (1984) an algorithm is 
presented which generates all solutions to a given equation. The 
inherent complexity of Makanin's algorithm for groups is investigated in 
Ko?scielski and Pacholski (1998). The authors define the notion of 
abstract Makanin algorithm. They show that this abstract scheme is not 
primitive recursive.

Another direction to extend Makanin's result is to include partial 
commutation: Let I ? ? ? ? be a relation between letters, which says 
when letters may commute (i.e., when they are independent). The quotient 
monoid M(?; I) = ??=fab = ba j (a; b) 2 Ig is called the free partially 
commutative monoid. It was introduced in Cartier and Foata (1969), where 
interesting combinatorial properties were discovered. In computer 
science free partially commutative monoids are usually called trace 
monoids, a notion which is due to Mazurkiewicz (1977). The interest is 
that partial commutation expresses some basic phenomena of concurrency, 
let us refer to Diekert and Rozenberg (1995) for an overview. 
Syntactically, a system of trace equations is the same as a system of 
words equations, but solutions are searched in the trace monoid, this 
means the commutation relations ab = ba can be used for free for all (a; 
b) 2 I. For example, if (a; b) 2 I, then, contrary to the situation in 
free monoids, the trace equation axb ? bya has a solution ö(x) = ö(y) = 
1. The set of all solutions of this trace equation is given by ö(x) = 
ö(y) and alphabetic constraints.

It is shown in Matiyasevich (1997) that the satisfiability of a system 
of trace equations is decidable. The proof is a reduction of trace 
equations to word equations with regular constraints. As a byproduct of 
the reduction we may put arbitrary recognizable constraints on the 
variables without loosing satisfiability. Another reduction using a new 
result on lexicographic normal forms of traces is presented in Diekert, 
Matiyasevich, and Muscholl (1997).

A challenging open question to date is a generalization of Makanin's 
result to free partially commutative groups. But this is only one of 
many open questions in this area. The theory of word equations is still 
exciting and many problems remain to be solved.

38


Problems

Section 1

1.1 Decide whether or not the solution abbababbaabab given to Ex. 1.1 is 
a non-singular solution of minimal length.

1.2 Show that the satisfiability problem of systems of word equations 
without regular constraints is NP-hard.
Hint: Show that the problem is NP-complete, if there is exactly one 
constant, A = fag. Use the fact that linear integer programming is 
NP-hard, even in unary notation.

1.3 Modify the decision procedure, where each variable occurs at most 
twice, to include the case where we have regular constraints. Show that 
the underlying decision procedure is PSPACE complete, if the regular 
constraints are specified by a list of NFA (nondeterministic finite 
automata). Hint: The hardness follows directly from well-known PSPACE 
complete problems on regular sets.

Section 2

2.1 Give a greedy algorithm to compute the p-stable normal form of a 
word w 2 A?: Modify the algorithm by pattern matching techniques such 
that it runs in linear time O(jwj + jpj):

2.2 Prove Props. 2.8, 2.9, and 2.10. Show that the results remain true 
when there are regular constraints.

2.3 Show that the satisfiability problem of single word equations 
without regular constraints is NP-hard. Hint: Compare this problem with 
Prob. 1.2.

2.4 Let Lx ? A? be a regular language. Describe the set of all solutions 
ö for an equation with only one unknown x under the constraint ö(x) 2 
Lx.

Section 3

3.1 An instance of a linear integer programming problem is given by an m 
? n matrix D 2 Zm?n and a vector c 2 Zm. Let x 2 Nn be a minimal vector 
such that Dx = c. Assume that the sum over the squares over the 
coefficients in each row of D is in O(1) and kck 2 O(n2). Show by 
elementary methods that there is a (small) constant c such that

kxk 2 O(2cn):

Hint: The proof is a slight modification of the standard proof which 
shows that linear integer programming is NP-complete. Use Hadamard's 
Inequality for an upper bound for the maximal absolute value over the 
determinants of square submatrices of D. Next, show that if x 2 Nn is a 
minimal solution, then there is also a minimal solution x0 2 Nn such 
that first, the absolute value of at least one component can be bounded 
and second,
Pn
i=1 xi <=
Pn
i=1 x0
i. Freeze by an additional equation one variable of x0

39


to be a constant. Repeat the process until the homogeneous system Dx = c 
has only the trivial solution. Then apply Cramer's Rule.
It should be noted that this method doesn't yield the best possible 
result. But it is good enough to establish that e(d) 2 2O(d), which was 
used in the proof of Thm. 3.2.

Section 4

4.1 Consider the reduction in the proof of Lem. 4.14. Give an estimation 
for the length d of the word equation and thereby for an upper bound of 
e(B). Define another reduction where the denotational length of the 
resulting word equation becomes smaller. This improves also the 
estimation for e(B). Give a third estimation for e(B) based on the 
techniques presented in Sect. 3.
Hint to the second part: If a system contains two equations x = x0 and 
xy = x0y0, then the second one can be replaced by y = y0.

4.2 The lower bound for e(c(S); d) given in Ex. 3.4 can be refined. 
Ko?scielski and Pacholski (1996: Thm. 4.8) consider the following 
equation with k = 5:

xnaxnbxn?1b ? ? ? x2bx1 = axnxk
n?1bxk
n?2b ? ? ? xk
1ba:

Show that there is a unique solution. Derive from this solution a lower 
bound for the constant hidden in the notation e(c(S); d) 2 c(S) ? 2?(d). 
Why is k = 5 a good value? Hint: Show first that ö(xi) 2 a? for all 1 <= 
i <= n.

References

Abdulrab, H. and P?ecuchet, J. (1990). Solving word equations, J. 
Symbolic Computation, 8 (5), 499{521.

Abdulrab, H. and P?ecuchet, J. (Eds.). (1993). Proceedings of Word 
Equations and Related Topics (IWWERT '91), Vol. 677 of Lect. Notes Comp. 
Sci., Berlin-Heidelberg-New York. Springer-Verlag.

Cartier, P. and Foata, D. (1969). Probl?emes combinatoires de 
commutation et r?earrangements. No. 85 in Lecture Notes in Mathematics. 
Springer, Berlin-Heidelberg-New York.

Charatonik, W. and Pacholski, L. (1993). Word Equations with Two 
Variables, In Abdulrab, H. and P?ecuchet, J.-P. (Eds.), Proceedings of 
Word Equations and Related Topics, Second International Workshop, 
IWWERT'91, Rouen, France, Vol. 677 of Lect. Notes Comp. Sci., pp. 43{56 
Berlin-Heidelberg-New York. Springer-Verlag.

Choffrut, C. and Karhumäki, J. (1997). Combinatorics of Words, In 
Rozenberg, G. and Salomaa, A. (Eds.), Handbook of Formal Languages, Vol. 
1, pp. 329{438. Springer, Berlin-Heidelberg-New York.

Dickson, L. E. (1913). Finiteness of the odd perfect and primitive 
abundant numbers with n distinct prime factors, American Journal of 
Math., 35, 413{422.

40


Diekert, V., Matiyasevich, Yu., and Muscholl, A. (1997). Solving trace 
equations using lexicographical normal forms, In Degano, P., Gorrieri, 
R., and Marchetti-Spaccamela, A. (Eds.), Proc. of the 24th ICALP, 
Bologna, 1997, No. 1256 in Lect. Notes Comp. Sci., pp. 336{347 
Berlin-Heidelberg-New York. Springer-Verlag.

Diekert, V. and Rozenberg, G. (Eds.). (1995). The Book of Traces. World 
Scientific, Singapore.

Eyono Obono, S., Goralcik, P., and Maksimenko, M. (1994). Efficient 
Solving of the Word Equations in One Variable, In Privara, I. et al. 
(Eds.), 19th Symposium on Mathematical Foundations of Computer Science 
(MFCS'94), Ko<=sice (Slovakia) 1994, No. 841 in Lect. Notes Comp. Sci., 
pp. 336{341 Berlin-Heidelberg-New York. Springer.

Gathen, J. von zur and Sieveking, M. (1978). A bound on solutions of 
linear integer equalities and inequalities, Proceedings of the American 
Mathematical Society, 72 (1), 155{158.

Hmelevski>=?, Ju. I. (1971). Equations in Free Semigroups, In 
Petrovski>=?, I. G. (Ed.), Trudy Mat. Inst. Steklov. 107. (In Russian) 
English translation in: Proceedings of the Steklov Institute of 
Mathematics 107 (1976). American Mathematical Society.

Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata 
Theory, Languages, and Computation. Addison-Wesley, Reading, MA.

Jaffar, J. (1990). Minimal and Complete Word Unification, J. Assoc. 
Comput. Mach., 37 (1), 47{85.

Ko?scielski, A. and Pacholski, L. (1996). Complexity of Makanin's 
Algorithm, J. Assoc. Comput. Mach., 43 (4), 670{684. Preliminary version 
in Proc. of the 31st Annual IEEE Symposium on Foundations of Computer 
Science, Los Alamitos (1990).

Ko?scielski, A. and Pacholski, L. (1998). Makanin's algorithm is not 
primitive recursive, Theoret. Comput. Sci., 191 (1-2), 145{156.

Lentin, A. (1972). ?
Equations dans les monö?des libres. Gauthiers-Villars.

Lothaire, M. (1983). Combinatorics on Words, Vol. 17 of Encyclopedia of 
Mathematics and its Applications. Addison-Wesley, Reading, MA.

Makanin, G. S. (1977). The problem of solvability of equations in a free 
semigroup, Mat. Sbornik, 103 (2), 147{236. (In Russian) English 
translation in: Math. USSR Sbornik 32 (1977) 129{198.

Makanin, G. S. (1979). Recognition of the rank of equations in a free 
semigroup, Izv. Akad. Nauk SSR, Ser. Mat. 43. (In Russian) English 
translation in: Math. USSR Izvestija 14 (1980) 499{545.

Makanin, G. S. (1981). Equations in a free semigroup, American 
Mathematical Society translations (2), 117, 1{6.

Makanin, G. S. (1982). Equations in a free group, Izv. Akad. Nauk SSR, 
Ser. Mat. 46, 1199{ 1273. (In Russian) English translation in: Math. 
USSR Izvestija 21 (1983) 483{546.

41


Makanin, G. S. (1984). Decidability of the universal and positive 
theories of a free group, Izv. Akad. Nauk SSR, Ser. Mat. 48, 735{749. 
(In Russian) English translation in: Math. USSR Izvestija 25 (1985) 
75{88.

Markowsky, G. (1977). Bounds on the index and period of a binary 
relation on a finite set, Semigroup Forum, 13, 253{259.

Matiyasevich, Yu. (1968). A connection between systems of word and 
length equations and Hilbert's Tenth Problem, Sem. Mat. V. A. Steklov 
Math. Inst. Leningrad, 8, 132{ 144. (In Russian) English translation in: 
Seminars in Mathematics, V. A. Steklov Mathematical Institute 8 (1970) 
61{67.

Matiyasevich, Yu. (1993). Hilbert's Tenth Problem. MIT Press, Cambridge, 
Massachusetts.

Matiyasevich, Yu. (1997). Some Decision Problems for Traces, In Adian, 
S. and Nerode, A. (Eds.), Proceedings of the 4th International Symposium 
on Logical Foundations of Computer Science (LFCS'97), Yaroslavl, Russia, 
July 6{12, 1997, No. 1234 in Lect. Notes Comp. Sci., pp. 248{257 
Berlin-Heidelberg-New York. Springer-Verlag.

Mazurkiewicz, A. (1977). Concurrent Program Schemes and their 
Interpretations, DAIMI Rep. PB 78, Aarhus University, Aarhus.

P?ecuchet, J.-P. (1981). Sur la d?etermination du rang d'une ?equation 
dans le monö?de libre, Theoret. Comput. Sci., 16, 337{340.

Perrin, D. (1989). Equations in words, In Ait-Kaci, H. and Nivat, M. 
(Eds.), Resolution of equations in algebraic structures, Vol. 2, pp. 
275{298. Academic Press.

Plotkin, G. (1972). Building in equational theories, Machine 
Intelligence, 7, 115{162.

Razborov, A. A. (1984). On systems of equations in a free group, Izv. 
Akad. Nauk SSR, Ser. Mat. 48, 779{832. (In Russian) English translation 
in: Math. USSR Izvestija 25 (1985) 115{162.

Schulz, K. U. (1992a). Makanin's Algorithm for Word Equations: Two 
Improvements and a Generalization, In Schulz, K.-U. (Ed.), Proceedings 
of Word Equations and Related Topics, 1st International Workshop, 
IWWERT'90, Tübingen, Germany, Vol. 572 of Lect. Notes Comp. Sci., pp. 
85{150 Berlin-Heidelberg-New York. Springer-Verlag.

Schulz, K. U. (Ed.). (1992b). Proceedings of Word Equations and Related 
Topics (IWWERT '90), Vol. 572 of Lect. Notes Comp. Sci., 
Berlin-Heidelberg-New York. Springer-Verlag.

Schulz, K. U. (1993). Word Unification and Transformation of Generalized 
Equations, Journal of Automated Reasoning, 11 (2), 149{184.

42


Index

admissible extension, 16

basis, 22
brick, 20

clean convex chain, 21
conjugates, 3
convex chain, 21
convex chain condition, 22
critical boundary, 27

denotational length, 8
domino tower, 4
downward correctness, 26

exponent of periodicity, 5, 17

final index, 22
foundation, 22
free partially commutative monoid, 38

linear order over S, 15

model, 15

non-singular solution, 1

position, 15
primitive word, 3

refinement, 14
right boundary, 20

singular solution, 1
solution, 1, 17
stable normal form, 5
standard representation, 15
system of boundary equations, 17
system of word equations, 1

terminal node, 33
trace monoid, 38
transport position, 27

upward correctness, 26

43