<=<= <=<= <= <= <=<= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= <= Universität Stuttgart Fakultät Informatik ? Institut für Informatik Breitwiesenstraße 20-22 D-70565 Stuttgart Makanin's Algorithm for Solving Word Equations with Regular Constraints Volker Diekert Report Nr. 1998/02 March 3, 1998 Abstract We give a self-contained proof of a fundamental result of Makanin (1977), which solves the satisfiability problem of equations with constants over free monoids. Our presentation of Makanin's algorithm is borrows Schulz (1992a), where Makanin's result is extended to the case where solutions are restricted by imposing regular constraints on the variables. This report appears (with minor modifications) as a chapter of the new book of M. Lothaire Algebraic Combinatorics on Words. Contents 1 Introduction 1 2 Simple facts on words and on equations 3 2.1 Notations and combinatorial properties . . . . . . . . . . . . . . . . . . . . . 3 2.2 Domino Towers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Stable normal forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 From a system to a single equation and vice versa . . . . . . . . . . . . . . . 6 2.5 A single variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Linear Diophantine equations: Bounding the exponent of periodicity 8 4 Boundary equations 14 4.1 Linear orders over S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 From word equations to boundary equations . . . . . . . . . . . . . . . . . . . 16 4.3 The convex chain condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Transformation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Proof of Theorem 4.12 32 5.1 Decidability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Complexity in terms of the semigroup S and the maximal number of boundary equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 An upper bound for the complexity of solving word equations . . . . . . . . . 36 6 Notes 37 Problems 39 References 40 1 Introduction The aim of this text is to give a self-contained proof of a fundamental result of Makanin which implies that the existential theory of word equations over free monoids is decidable. In other terms, Makanin's result is the decidability of the existential theory of concatenation. Let A be an alphabet of constants and let ? be a set of variables. A word equation L = R is a pair (L; R) 2 (A [ ?)? ? (A [ ?)?, and a system of word equations is a set of equations fL1 = R1; : : : ; Lk = Rkg. A solution is a homomorphism ö : (A [ ?)? ! A? leaving the letters of A invariant such that ö(Li) = ö(Ri) for all 1 <= i <= k. A solution is henceforth identified with a mapping ö : ? ! A?. It is called non-singular, if ö(x) 6= 1 for all x 2 ?. Otherwise it is called singular. The satisfiability problem for word equations is to decide whether a given word equation has a solution. The problem is usually stated for a single equation, but this is no loss of generality. Given a propositional formula over word equations all negations can be eliminated; and then, passing to a disjunctive normal form, the problem of satisfiability can be reduced to a single conjunction. It is therefore enough to consider a system of word equations, which in turn can be transformed into a single word equation. This way the decidability of propositional formulas over word equations can be reduced to the 1 satisfiability problem of single word equations. Makanin (1977) yields a decision procedure for the satisfiability problem of word equations; and the decidability of the existential theory over free monoids follows. Since the problem is semi-decidable by its nature, a positive answer suffices to compute (if desired) a solution ö : ? ! A? which is minimal, say with respect to P x2? jö(x)j. Here and in the following computation means that there is an effective procedure in the mathematical sense. We shall derive a double exponential space bound only to solve the satisfiability problem, and the length of a minimal solution will be at most four times exponential in the input size of the word equation. These upper bounds are far beyond any practical meaning, but it is not clear that this reflects the inherent complexity of the problem. In practice the algorithm seems to behave much better. For example, up to now no solvable word equation is known where the minimal solution exceeds exponential length. Example 1.1 Let A = fa; bg and ? = fx; y; z; ug: Consider the equation xauzau = yzbxaaby This equation is solvable, a possible non-singular solution is given by: ö(x) = abb; ö(y) = ab; ö(z) = ba; ö(u) = bab: We have abbababbaabab = ö(xauzau) = ö(yzbxaaby): There is a rather straightforward algorithm to decide the solvability of a system of word equations where each variable occurs at most twice. This algorithm is due to Matiyasevich (1968). Since the general solution refers (implicitly) to the underlying idea, we explain it here as an introductory example. Let E = fL1 = R1; : : : ; Lk = Rkg be a system of word equations where every variable x 2 ? occurs at most twice in E: Let kEk = k P i=1 jLiRij denote the denotational length of E: The question is whether there is a solution. Using induction on j?j we describe a non-deterministic decision algorithm which works without exceeding a linear space bound in kEk. The basis ? = ; is clear, hence let ? 6= ;: The first step is then to guess whether there is a solution ö : ? ! A? where ö(x) = 1 for some x 2 ?: This is done by choosing some x 2 ? and replacing the occurrences of x in E by the empty word. We obtain a new system E0 over ? n fxg and recursively we decide in non-deterministic linear space whether E0 has a solution. Thus, after this step we are looking for non-singular solutions of E, only. We may assume that the first equation is either of the form x ? ? ? = a ? ? ? with x 2 ?; a 2 A or x ? ? ? = y ? ? ? with x 2 ?; y 2 ?; x 6= y: By symmetry (or a non-deterministic guess to interchange the r^ole of L1 and R1) we may either write x = az or x = yz, where z is a new variable. Replacing all occurrences of x by az or yz respectively, we obtain a new system where x does not occur any more and z occurs at most twice. On the left of the first equation we may cancel either a or y, and then y occurs also at most twice. Hence we end up with a new system E0 where the number of variables is the same as in E, every variable occurs at most twice and we have kE0k <= kEk. Note that E0 may have a singular solution with ö(z) = 1. However, if E0 is solvable, then E is also solvable. 2 Now, let ö : ? ! A? be a non-singular solution of E where P x2? jö(x)j is minimal. Then we find a solution ö0 for E0 with jö0(z)j < jö(x)j since ö(y) 6= 1. Thus, the length of a shortest solution has decreased, showing that the non-deterministic procedure will find a solution, if there is any. Since we have a linear space bound, the procedure can be transformed into a deterministic decision algorithm of at most exponential time. The presentation of the general case will mainly follow Schulz (1992a), thereby showing the result of Makanin in a more general setting. Assume that for every x 2 ? a regular language Lx ? A? is given as part of the problem instance together with the equation L = R: Then we can decide whether or not there exists a solution ö : ? ! A? satisfying additionally the regular constraints ö(x) 2 Lx for all x 2 ?: (For example, we can prescribe the alphabet in a solution ö(x) for all x 2 ?:) In the following we do not focus very much on necessary decidable conditions which are useful to prune the search tree. A good pruning strategy is of course extremely important for an implementation since the search tree tends to be huge. However pruning the tree doesn't help to understand the algorithm nor it seems to have any effect on the worst-case analysis. 2 Simple facts on words and on equations 2.1 Notations and combinatorial properties Throughout this chapter A = fa; b; : : : g denotes an alphabet of constants and ? is a set of variables (or unknowns) such that A \ ? = ;. We shall use the same symbol ö to denote a mapping ö : ? ! A? and its canonical extension to a homomorphism ö : (A[?)? ! A?: The symbol 1 denotes the empty word and the unit element in other monoids and also the natural number 1 2 N: The length of a word w is denoted by jwj. The prefix relation (proper prefix relation) of words is denoted by u <= v ( u < v resp.). Recall that a word p is called primitive, if it cannot be written in the form p = rff with ff 6= 1: Lower case Greek letters ff; fi etc. are mostly used to denote natural numbers. The set of natural numbers is N, it includes zero, the set of integers is Z. By log ff we mean maxf1; dlog2 ffeg. Two words y; z 2 A? are conjugates, if xy = zx for some x 2 A?. The next proposition shows that in free monoids conjugates are obtained by transposition. Proposition 2.1 Let x; y; z 2 A? be words, y; z 6= 1. Then the following assertions are equivalent: 1. xy = zx; 2. 9 r; s 2 A?; s 6= 1; ff >= 0 : x = (rs)ffr; y = sr; and z = rs: Proposition 2.2 Let p 2 A? be primitive and p2 = xpy for some x; y 2 A?: Then we have either x = 1 or y = 1 (but not both). Proofs of Props. 2.1 and 2.2 can be found in Lothaire (1983) or elsewhere. An overlapping of two words w1 and w2 is depicted by the following figure: w1 w2 3 It says that the common border is an identical factor, i.e., w1 = xy, w2 = zx. Usually we mean x 6= 1 and sometimes the figure also indicates that both y 6= 1 and z 6= 1. But there will be no risk of confusion. For example, Prop. 2.2 can be rephrased by saying that the following picture is not possible for a primitive word p 2 A?: p p p 2.2 Domino Towers Every non-empty word w 2 A+ can be written in the form w = (rs)h?1r with s 6= 1, h >= 2. Then it can be arranged in figure looking like a domino tower of height h: height h 8 > > > > > > > > > > < > > > > > > > > > > : r s r s r s ? ? ? r s r s r r s r s r s ? ? ? r s r s r r s r s r s ? ? ? r s r s r ... ... ... ... ... ... r s r s r s ? ? ? r s r s r r s r s r s ? ? ? r s r s r r s r s r s ? ? ? r s r s r The position of the vertical line says that the upper left boundary is not on the right of the lower right boundary. The formal definition of such an arrangement allows also a less regular shape. It is as follows: Definition 2.3 Let h >= 2. We say that a non-empty word w 2 A+ can be arranged in a domino tower of height h, if there are words x1; : : : ; xh?1 2 A? and non-empty words y1; : : : ; yh?1; z2; : : : ; zh 2 A+ such that 1. w = xiyi = zi+1xi for all 1 <= i < h, 2. jz2 ? ? ? zhj <= jwj. Note that for h = 2 the domino tower may degenerate as in the following figure. w w Definition 2.4 Let w 2 A? be a word. The exponent of periodicity exp(w) is defined by exp(w) = maxfff 2 N j 9r; s; p 2 A?; p 6= 1 : w = rpffsg: Lemma 2.5 Let w 2 A+ be a non-empty word which can be arranged in a domino tower of height h. Then we have exp(w) >= h ? 1. 4 Proof . Choose a domino tower and words xi; yi; zi as in the definition above. Let z = zi 2 fz2; : : : ; zhg be of minimal length, x = xi?1, y = yi?1. Then (h ? 1)jzj <= jwj and we have xy = zx = w. Hence y and z are conjugates and we may apply Prop. 2.1. We obtain z = rs and x = (rs)ffr for some ff >= 0 and jrj < jzj. Hence w = zff+1r and therefore (h ? 1)jzj <= jwj < (ff + 2)jzj : Since jzj > 0 we see that h ? 1 <= ff + 1 <= exp(w). 2 2.3 Stable normal forms One of the key ideas in Makanin's proof is that, given a word equation, the exponent of periodicity of a shortest solution has an effective upper bound. This relies on the notion of p-stable normal form. Definition 2.6 Let p 2 A+ be a primitive word. The p-stable normal form of the word w 2 A? is a shortest sequence (u0; ff1; u1; : : : ; ffk; uk) such that k >= 0 (k is minimal), u0; ui 2 A?; ffi >= 0 for 1 <= i <= k, and the following conditions are satisfied: 1. We have w = u0pff1u1 ? ? ? pffkuk. 2. We have k = 0 if and only if p2 is not a factor of w: 3. If k >= 1, then we have: u0 2 A?p n A?p2A?; ui 2 (A?p \ pA?) n A?p2A? for 1 <= i < k; uk 2 pA? n A?p2A?: Proposition 2.7 Let p 2 A+ be primitive. The p-stable normal form of w 2 A? is uniquely defined. This means, if (u0; ff1; u1; : : : ; ffk; uk) and (v0; fi1; : : : ; fi`; v`) are p-stable normal forms of the same word w 2 A?; then they are identical, i.e., we have k = `; u0 = v0; ui = vi, and ffi = fii for 1 <= i <= k: Proof . Assume that (u0; ff1; u1; : : : ; ffk; uk) and (v0; fi1; v1; : : : ; fi`; v`) are both p-stable normal forms of w. Since these are shortest sequences, the indices k and ` are both minimal, hence k = `. For k = 0 we have w = u0 = v0, hence let k = ` >= 1. We show first that u0 = v0. To see this, suppose by symmetry that ju0j <= jv0j. Since u0p 2 A?p2 and v0 2 (A?p n A?p2A?), we obtain that u0 <= v0 < u0p. By Prop. 2.2 this yields u0 = v0. Let w0 denote the word u1pff2u2 ? ? ? pffkuk. A simple reflection using u1 6= p, Prop. 2.2, and u1 2 (A?p \pA?) n A?p2A? shows that pff1w0 2 pff1+1A? n pff1+2A?. This implies ff1 = fi1 and w0 = v1pfi2v2 ? ? ? pfikvk. Since we have w0 2 pA?, we see that the first component of its p-stable normal form is in pA?. Hence (u1; ff2; u2; : : : ; ffk; uk) is the p-stable normal form of w0. By induction we conclude (u1; ff2; u2; : : : ; ffk; uk) = (v1; fi2; v2; : : : ; fik; vk). Hence the proposition. 2 5 2.4 From a system to a single equation and vice versa The existential theory of equations over free monoids is decidable, i.e., the satisfiability of any propositional formula over word equations (with regular constraints) can be decided. Let us show the reduction to Makanin's result. In a first step we may assume that all negations in a given formula are of type L 6= R. Due to the following proposition these negations can be eliminated. Proposition 2.8 An inequality L 6= R is equivalent with respect to satisfiability to the following formula using x; y and z as new variables: _ a2A (L = Rax _ R = Lax) _ _ a;b2A a6=b (L = xay ^ R = xbz) : In a second step the formula (without negations) is written in disjunctive normal form. Then, for satisfiability, it is enough to see how a system of word equations can be transformed into a single word equation. The method is given in Prop. 2.9. It relies on the trivial fact that if ua <= va; ub <= vb, u; v 2 A?, a; b 2 A, and a 6= b, then we have u = v. Proposition 2.9 Let a; b 2 A be distinct letters (if A = fag, then let b denote a new letter resp.) and let E = fL1 = R1; : : : ; Lk = Rkg be a system of word equations. Then the set of solutions of E is identical (in canonical bijection resp.) with the set of solutions of the following equation: L1a ? ? ? LkaR1 ? ? ? Rk L1b ? ? ? Lkb = R1a ? ? ? RkaL1 ? ? ? Lk R1b ? ? ? Rkb: Sometimes it is useful to do the opposite of the proposition above and to split a single word equation into a system where all equations are of type xy = z with x; y; z 2 A[?. This can be derived from the next proposition. Again its (simple) proof is left to the reader. Proposition 2.10 Let x1 ? ? ? xg = xg+1 ? ? ? xd be a word equation with 1 <= g < d; xi 2 A [ ? for 1 <= i <= d: Then the set of solutions of L = R is in canonical bijection with the set of solutions of the following system: x1 = y1; xg+1 = yg+1; y1x2 = y2; yg+1xg+2 = yg+2; ... ... yg?1xg = yg; yd?1xd = yd; yg = yd: In the system above y1; : : : ; yd denote new variables. 2.5 A single variable A parametric description of the set of all solutions can be computed in polynomial time, if there is only one variable occurring in the equation. This serves as an example of why p-stable normal forms might be useful, but it is not used elsewhere. The reader may skip this subsection. 6 Let E be a set of word equations where exactly one variable x occurs, ? = fxg. By Prop. 2.9 E is given by a single equation L = R with L; R 2 (A [ fxg)?: The first check is whether ö(x) = 1 yields the singular solution. It is then enough to consider only non-singular solutions. Let us denote by L a list of pairs (p; r) where p 2 A+ is primitive and r 2 A? is a prefix r < p. We say that L is complete for the equation L = R, if every non-singular solution ö has the form ö(x) = pffr for some ff >= 0 and (p; r) 2 L. Since our intention here is to give an application for p-stable normal forms, assume for a moment that a finite complete list L has already been computed in a first phase of the algorithm. Then we proceed as follows. For each pair (p; r) 2 L we make a first test whether ö(x) = r is a solution and a second test whether ö(x) = pr is a solution. After that we search (for this pair (p; r)) for solutions where ö(x) = pffr with ff >= 2. Replace all occurrences of x in the equation L = R by the expression ppff?2pr, where ff now denotes an integer variable. Thus, the problem is now to find solutions for ff such that ff >= 2. Using the symbolic expression we can factorize L and R in their p-stable normal forms: L = u0pm1ff+n1u1 ? ? ? pmkff+nkuk; R = v0pm01ff+n01v1 ? ? ? pm0`ff+n0`v`: Note that k; ` >= 0 and mi; m0 j 2 N; ni; n0 j 2 Z for 1 <= i <= k and 1 <= j <= `. By Prop. 2.7 we have to verify k = `; ui = vi for 0 <= i <= k and we have to solve a linear Diophantine system: (mi ?m0 i)ff = n0 i ? ni for 1 <= i <= k: There are three cases. Either no or exactly one ff >= 2 or all ff >= 2 satisfy these equations. Note that for each pair (p; r) the necessary computations can be done in linear time. The performance of the algorithm depends therefore on an efficient computation of a short and complete list L. We may assume that L = ux ? ? ? and R = xvcldots, where u 2 A+; v 2 A? and both words u and v are of maximal length. Let p 2 A+ be the primitive root of u, i.e., p is primitive and u = pe for some e >= 1. If ö is a solution of L = R, then ö solves also an equation of type ux = xw for some word w 2 A+. By Prop. 2.1 it is immediate that we have ö(x) = pffr for some ff >= 0 and r < p. Thus, the simple method is to define the list L by all pairs (p; r) where r < p. We obtain a list L with kpk elements. It is clear that all computations can be done in polynomial time. In fact square time is enough. There is an improvement to an O(kEk log kEk)-algorithm due to Eyono Obono, Goralcik, and Maksimenko (1994). This improvement is a clever method to compute a complete list L of at most logarithmic length. The method uses a finer combinatorial analysis and it relies, in particular, on the following facts which can be found in Lothaire (1983): - Let r; s 2 A?. If the word sr is primitive, then rs is also primitive. - Let p; q 2 A+ be primitive words and u = pe; w = qf for some e; f >= 1. If u and w are conjugates, then p and q are conjugates and there is a unique factorization p = rs; q = sr with r < p. Moreover, if ux = xw for some word x 2 A?, then we have x = pffr for some ff >= 0 and the unique prefix r < p above. - Let p; q; r 2 A+ be primitive words such that p2 < q2 < r2. Then we have jpj + jqj <= jrj. In particular, a word w 2 A? of length n has at most O(log n) distinct prefixes of the form pp where p is primitive. 7 The aim is to compute a complete list L for the equation L = R of length O(jLRj). For this purpose we divide the set of non-singular solutions into two classes. The first class contains all solutions where jö(x)j >= juj ? jvj. (Of course, in the case juj <= jvj all solutions satisfy this condition.) Let w be the prefix of the word vu such that jwj = juj. If ö is a solution with jö(x)j >= juj ? jvj, then we have uö(x) = ö(x)w. Let p be the primitive root of u and let q be the primitive root of w. Then ö(x) = pffr for some ff >= 0 and the unique prefix r < p such that p = rs and q = sr. If p and q are not conjugates, then there is no such solution. Otherwise, if p and q are conjugates, we include the unique pair (p; r) into L. This pair covers all solutions where jö(x)j >= juj ? jvj. Now, let ö be a non-singular solution such that 0 6= jö(x)j < juj ? jvj. This implies that R has the form R = xvx ? ? ? and that ö(x)vö(x) < uö(x). Hence ö(x)vö(x) < uu and ww < vuu, where w denotes the non-empty word vö(x). Let q be the primitive root of w, then we have qq < vuu. There is a unique factorization q = sr with s < q such that v 2 q?s. the word rs is also primitive and we have ö(x) = (rs)ffr for some ff >= 0. Therefore it is enough to compute the list of all primitive words q such that qq < vuu. If v = 1, then we add all pairs (q; 1) to L. Otherwise, if v 6= 1, then we compute for each q the unique factorization q = sr with s 6= 1 such that v 2 q?s. We add all pairs (rs; r) to L. 3 Linear Diophantine equations: Bounding the exponent of periodicity The input for the algorithm is an equation L = R with L; R 2 (A[?)? together with regular languages Lx ? A? for all variables x 2 ?. We are looking for a solution ö : ? ! A? such that ö(L) = ö(R) and ö(x) 2 Lx for all x 2 ?. For notational convenience we don't distinguish variables from constants in the equation henceforth. Every constant a 2 A is replaced by a new variable xa and the constraint Lxa = fag for all a 2 A. (For readability we use constants in examples however.) From now on the equation is given as x1 ? ? ? xg = xg+1 ? ? ? xd with xi 2 ?. In order to exclude trivial cases we shall assume 1 <= g < d whenever convenient. The number d is called the denotational length of the equation. It is enough to consider non-singular solutions. Hence we shall assume that 1 62 Lx for all x 2 ?. Next we fix a finite semigroup S and a semigroup homomorphism ' : A+ ! S such that Lx = '?1'(Lx) for all x 2 ?. For later use we demand that ' is surjective. The semigroup S can be realized as the image '(A+) of the canonical homomorphism to the direct product of the syntactical monoids with respect to Lx for x 2 ?. Sometimes it is more convenient to work with monoids instead of semigroups. We denote by S1 the monoid, which is obtained by adjoining by a unit element 1 to S. We have S1 n f1g = S and the homomorphism ' is extended to a monoid homomorphism ' : A? ! S1. We have '?1(1) = f1g and '(A+) = S. Given S we can compute constants t(S) >= 0 and q(S) > 0 such that st(S)+q(S) = st(S) for all s 2 S1. In the following we actually use another constant c(S), which is defined as the least multiple of q(S) such that c(S) >= maxf2; t(S)g. Note that this implies sr+ffc(S) = sr+fic(S) for all s 2 S1 and r >= 0 and ff; fi >= 1. 8 Remark 3.1 Assume that each regular language Lx is specified by an NFA with rx states, x 2 ?. Let r = P x2? rx. Then we may choose the semigroup S such that jSj <= 2r2 and c(S) <= r!: A proof for these bounds can be found in Markowsky (1977), where a more precise analysis is given. For the moment explicit upper bounds for jSj and c(S) are not used. They are used only later (Sect. 5.3) when complexity issues are investigated. The important point here is that our knowledge from linear algebra how to find all minimal solutions of a system of linear Diophantine equations yields an effective upper bound for the exponent of periodicity of a solution of minimal length of a given word equation (with regular constraints). Any effective upper bound would be sufficient, but one can do better. The upper bound is exponential in the input size, and this is essentially optimal. In the proof below a rather detailed analysis is given. So the proof becomes quite technical, which might hide the beautiful and simple idea behind it. Readers who are mainly interested in the pure decidability result are invited to ignore the exact values. Theorem 3.2 Let d >= 1 be a natural number, ' : A? ! S1 a homomorphism, and c(S) >= 2 as above. There is a computable number e(c(S); d) 2 c(S) ? 2O(d) satisfying the following assertion. Given as instance a word equation x1 ? ? ? xg = xg+1 ? ? ? xd of denotational length d together with a solution ö0 : ? ! A?, we can effectively find a solution ö : ? ! A? and a word w 2 A? such that the following conditions hold: 1. 'ö0(x) = 'ö(x) for all x 2 ?, 2. w = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd), 3. exp(w) <= e(c(S); d). Proof . For g = 0 or g = d, we have exp(w) = 0, hence let 1 <= g < d. Testing all words of length up to jö0(x1 ? ? ? xg)j we find a solution ö and a word w such that w = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd) is of minimal length among all solutions ö where 'ö0(x) = 'ö(x) for all x 2 ?. Recall that x1 ? ? ? xg = xg+1 ? ? ? xd is equivalent to the following system: x1 = y1; xg+1 = yg+1; y1x2 = y2; yg+1xg+2 = yg+2; ... ... yg?1xg = yg; yd?1xd = yd; yg = yd Note also that exp(w) = exp(ö(yg)). After an obvious elimination of variables, the system above is equivalent to a system of d ? 2 equations of type xy = z ; x; y; z 2 ?: 9 Choose a primitive word p 2 A+ such that w = upexp(w)v for some u; v 2 A?. Consider an equation xy = z from the system above and write the words ö(x); ö(y); ö(z) in their p-stable normal forms: ö(x) : (u0; r1 + ff1c(S); u1; : : : ; rk + ffkc(S); uk); ö(y) : (v0; s1 + fi1c(S); v1; : : : ; s` + fi`c(S); v`); ö(z) : (w0; t1 + fl1c(S); w1; : : : ; tm + flmc(S); wm): The natural numbers ri; si; ti; ffi; fii, and fli are uniquely determined by w, c(S), and the requirement 0 <= ri; si; ti < c(S). Since w is a solution, there are many equations among the words and among the integers. For example, for k; ` >= 2 we have u0 = w0, vl = wm, r1 = t1, ff1 = fl1, etc. In order to be precise, we shall use: ff1 = fl1; : : : ; ffk?1 = flk?1; fi2 = flm?`+2; : : : ; fi` = flm: We have no bound on k, `, or m, but we have jk + ` ? mj <= 2. What exactly happens depends on the p-stable normal form of the product ukv0. Since uk; v0 = 2 A?p2A?, it is enough to distinguish nine cases. Here are the nine possible p-stable normal forms of ukv0, where t 2 f0; 1g; uk ; v0 2 A?; and u0 k; v0 0; w0 2 A+: (ukv0), (p; t; p), (p; t; pv0 0), (u0 kp; t; p), (u0 kp; t; pv0 0), (p; 0; w0; 0; p), (p; 0; w0; 0; pv0 0), (u0 kp; 0; w0; 0; p), (u0 kp; 0; w0; 0; pv0 0). The case (p; 0; w0; 0; p) can be produced, if p has an overlap as in p = ababa. Then we might have uk = pabab; v0 = abap, which yields ukv0 = ppbap = pabpp and abp = pba. Hence the p-stable normal form ukv0 is (p; 0; abp; 0; p). We may conclude wk+1 = abp and tk + flkc(S) = rk + ffkc(S) + 1; tk+1 + flk+1c(S) = s1 + fi1c(S) + 1: In particular k + ` = m. If rk < c(S) ? 1, then ffk = flk, otherwise ffk + 1 = flk. Similarly, if s1 < c(S) ? 1, then fi1 = flk+1, otherwise fi1 + 1 = flk+1. A p-stable normal form of type (u0p; 0; w0; 0; pv0) with u0; v0; w0 2 A+ leads to k+` = m+2 and 0 = flk = flk+1. Let us consider another example. If ukv0 = p3, then k + ` = m+ 1 and we have rk + s1 + 3 + (ffk + fi1)c(S) = tk + flkc(S): Since by assumption c(S) >= 2, the case ukv0 = p3 leads to the equation: flk ? (ffk + fi1) = c with c 2 f0; 1; 2g: We have seen that there are various possibilities for ukv0. However, always the same phenomenon arises. First of all we obtain a bunch of trivial equations which can be eliminated by renaming. All equations of type fl = 0 are eliminated by substitution. Then, for each xy = z either there are at most two equations of type fl = ff + 1 or there is one equation of type fl ? (ff +fi) = c with c 2 f0; 1; 2g. If there are two equations of type fl = ff+ 1, then one 10 of them is eliminated by substitution. So after renaming and substituting we end up with at most one non-trivial equation having at most three variables. Proceeding this way through all d?2 word equations we have various interactions due to renaming and substitution. However, finally each equation xy = z leads to at most one non-trivial equation with at most three variables. The type of this equation is: ffl1fl + i1 ? ffl2ff ? i2 ? ffl3fi ? i3 = c where we have 0 <= i1; i2; i3 <= d ? 2; 0 <= c <= 2; ffl1; ffl2; ffl3 2 f0; 1g. This can be written as: ffl1fl ? ffl2ff ? ffl3fi = c0 with jc0j <= 2d ? 2: This type introduces a coefficient ?2 for ff = fi and ffl1 = ffl2 = ffl3 = 1. We have viewed the symbols ff; fi; : : : as variables ranging over natural numbers. Going back to the solution ö the symbols ff1; : : : ; ffl; fi1; : : : ; fi`; fl1; : : : ; flm represent concrete values which are given by the word w. Some of them might still be zero. These are eliminated now. The reason is that they cannot be replaced by other values without risk of changing the image by '. If ffi >= 1 is a remaining value, i.e., a number greater than zero, then we replace it by ffi = 1 + Zffi where now Zffi denotes a variable over N. For example an equation fl ? ff ? fi = c0 with ff; fi; fl >= 1 is transformed to a linear Diophantine equation with integer variables Zff; Zfi; Zfl >= 0 as follows: Zfl ? Zff ? Zfi = c0 + 1 with jc0 + 1j <= 2d ? 1: Putting all equations of type xy = z together we obtain a (perhaps) huge system of linear equations. After substitution and elimination of variables, we end up with a system of at most d ? 2 equations and n integer variables with n <= 3(d ? 2). The absolute values of the coefficients are bounded by 2 and that of the constants by 2d ? 1. For each equation the sum over the squares over the coefficients is bounded by 5. The linear Diophantine system is defined by w and the word w provides a non-negative integer solution. What becomes crucial now is the converse: Every solution in non-negative integers yields by backward substitution a word w00 and a solution ö00 : ? ! A? satisfying (i) and (ii) of the theorem. Therefore: Since w was chosen of minimal length, the solution of the integer system given by w is a minimal solution with respect to the natural partial ordering of Nn . In this ordering we have (ff1; : : : ; ffn) <= (fi1; : : : ; fin) if and only if ffi <= fii for all 1 <= i <= n. For ~ff = (ff1; : : : ; ffn) 2 Nn let k~ffk = maxfffi j 1 <= i <= ng. All we need is a recursive bound for the following value: e(d) = maxfk~ffk j ~ff is a minimal solution of a system of linear Diophantine equations with at most d ? 2 equations, 3(d ? 2) variables, where the absolute value of the coefficients is bounded by 2, the sum over the squares over the coefficients in each equation is bounded by 5, and the absolute value of constants is bounded by 2d ? 1g: Obviously, there are only finitely many systems of linear Diophantine equations where the number of equations, variables, and the absolute value of coefficients and constants is 11 bounded. For each system the set of minimal solutions is finite, this is a special case of Lemma A in Dickson (1913). Moreover the set of minimal solutions is effectively computable. Hence, the set of values of k~ffk above is finite and effectively computable. Therefore e(d) is computable. Since e(d) + d ? 1 >= ff1; : : : ; fi1; : : : for original values under the consideration above, we obtain a recursive upper bound for the exponent of periodicity. A much more precise statement is possible. It follows from the work of Ko?scielski and Pacholski (1996) that e(d) 2 2O(d), see also Rem. 3.3 below. Hence we can state: exp(w) <= 2 + (c(S) ? 1) + (e(d) + d ? 1) ? c(S) 2 c(S) ? 2O(d): This proves the theorem. 2 Remark 3.3 The analysis of Ko?scielski and Pacholski (1996) is more accurate than the one presented here, and it leads to linear Diophantine systems having a slightly different structure. The authors make use of the results given in the paper von zur Gathen and Sieveking (1978) They show that the exponent of periodicity of a minimal solution of a word equation of denotational length d is in O(21:07d). The authors don't consider regular constraints, but, as it is shown above, this doesn't change the situation very much: It yields the factor c(S). Therefore the actual result including regular constraints is: e(c(S); d) 2 c(S) ? O(21:07d): It is rather difficult to obtain this very good bound. However, we can circumvent this difficulty. A bound which is already nice and certainly good enough to establish Thm. 3.2 is e(d) 2 O(2cd) for some constant c, say c = 4. Such a more moderate bound can be obtained without any difficulty using the present approach and some standard knowledge in linear algebra, see Prob. 3.1 below. Example 3.4 Consider c; n >= 2 and let S = Z=cZ be the cyclic group of c elements. We give a regular constraint for the variable x1 by defining Lx1 = fw 2 A+ j jwj ? 0 (mod c)g: The system is given by x1 = ac; x2 = x2 1; : : : ; xn = x2 n?1: Its unique solution ö is: ö(xi) = ac?2i?1 ; 1 <= i <= n: A transformation into a single equation according to Prop. 2.9 shows that e(c(S); d) 2 c(S) ? 2?(d): Thus, the assertion given in Thm. 3.2 is essentially optimal. The following example shows that the length of a minimal solution can be very long although the exponent of periodicity is bounded by a constant. Example 3.5 Let n >= 1. Consider the following well-known system of word equations: x0 = a; y0 = b; xi = xi?1yi?1; yi = yi?1xi?1 for 1 <= i <= n: The unique solution is the Thue-Morse word: ö(xn) = abbabaabbaababbabaababbaabbabaab ? ? ? for n >= 5: We have jö(xn)j = 2n, but exp(ö(xn)) = 2. 12 Example 3.6 Consider the equation with regular constraints: axyz = zxay; Lx = a2a?; Ly = fa; bg? n (a? [ b?); Lz = fa; bg+: A suitable homomorphism ' : fa; bg+ ! S is given by the canonical homomorphism onto the quotient semigroup of fa; bg+, which is presented by the defining relations a2 = a3; b = b2; ab = ba = aab: Thus, S is a semigroup with a zero, 0 = ab; and S has four elements: S = fa; a2; b; 0g: The constant c(S) = 2 fits the requirement sr+c(S) = sr+ffc(S) for all s 2 S1 and r >= 0; ff >= 1. It is not difficult to find a solution ö for the equation above, e.g. ö(x) = a2; ö(y) = ba2, and ö(z) = a3ba2. Now let ff, fi, fl, ffi be some integer variables and let u; v; and w be parametric words, which are described by the following a-stable normal forms: u : (a; 2ff; a); v : (ba; 2fi; a); w : (a; 1 + 2fl; aba; 2ffi; a): In order to derive the system of linear Diophantine equations, we make a direct approach: We want to solve auvw = wuav. First we write auvw as a sequence of a{stable normal forms: ((a); (a; 2ff; a); (ba; 2fi; a); (a; 1 + 2fl; aba; 2ffi; a)): The resulting a{stable normal form is: (a; 2ff + 1; aba; 2fi + 2fl + 3; aba; 2ffi; a): Now consider the right-hand side wuav. This yields: (a; 2fl + 1; aba; 2ff + 2ffi + 3; aba; 2fi; a): We obtain the linear Diophantine system: 2ff + 1 = 2fl + 1; 2fi + 2fl + 3 = 2ff + 2ffi + 3; 2ffi = 2fi: Going back to the equation we see that for all ff >= 0 and fi >= ff the mapping ö(x) = a2+2ff; ö(y) = ba2+2fi ; ö(z) = a3+2ffba2+2fi yields a solution of the equation axyz = zxay satisfying the regular constraints. 13 4 Boundary equations 4.1 Linear orders over S Let us start with an informal explanation of the notions discussed in this subsection. Assume that x1 ? ? ? xg = xg+1 ? ? ? xd ; 1 <= g < d, xi 2 ? for 1 <= i <= d is a solvable word equation with regular constraints and that there is a non-singular solution ö. The solution is given by some word w 2 A+. The equation corresponds to two factorizations w = u1 ? ? ? ug = ug+1 ? ? ? ud yielding two sequences of non-empty words: (u1; : : : ; ug); (ug+1; : : : ; ud): Using the word w these sequences can be merged into a single one such that each ui is a product of some wk: w = (w1; : : : ; wm); wk 6= 1; 1 <= k <= m; m < d: Let us see what happens if we pass via ' to the finite semigroup S. Two sequences (p1; : : : ; pg) 2 Sg and (pg+1; : : : ; pd) 2 Sd?g are merged into a single sequence (s1; : : : ; sm) 2 Sm;m < d such that each pi 2 S is a product of some sk. We shall say that (s1; : : : ; sm) is a common refinement of (p1; : : : ; pg) and (pg+1; : : : ; pd). However, for a given input d, there are only finitely many possibilities for sequences of the form (s1; : : : ; sm); sj 2 S; 1 <= j <= m < d: Thus, in a non-deterministic step we can guess and fix such a sequence which is the '-image of (w1; : : : ; wm). A basic technique of solving word equations is to split a variable. Working over the sequence (s1; : : : ; sm) 2 Sm, a splitting of a variable x = x0x00 is accompanied with a splitting of some si and a guess of s0; s00 2 S such that si = s0s00. In this way the length of the sequences is increasing. Example 4.1 Consider the equation xauzau = yzbxaaby. The solution, which was given in Ex. 1.1, leads to the sequences (abb; a; bab; ba; a; bab) and (ab; ba; b; abb; a; a; b; ab), where (ab; b; a; b; ab; b; a; a; b; ab) is a common refinement. This can be best visualized by the following figure. a b b a b a b b a a b a b a b b a b a b b a a b a b a b b a b a b b a a b a b Passing to the semigroup S from Ex. 3.6 we could start to search for a solution with the sequence (0; b; a; b; 0; b; a; a; b; 0) 2 S10. We now start the formal discussion of this section. The semigroup S and the homomorphism ' : A+ ?! S is given as in precedent section. An S-sequence is a sequence (s1; : : : ; sm) 2 Sm; m >= 0. A representation of (s1; : : : ; sm) is a triple (I; <=; 'I) such that (I; <=) is a totally ordered set of m+ 1 elements and 'I : f(i; j) 2 I ? I j i <= jg ! S1 14 is a mapping satisfying for some order respecting bijection ? : I ~ !f0; : : : ; mg the condition 'I(i; j) = s?(i)+1 ? ? ? s?(j) 2 S1 for all i; j 2 I; i <= j: Note that we have 'I(i; j) = 1 if and only if i = j and 'I(i; k) = 'I(i; j)'I (j; k) for all i; j; k 2 I; i <= j <= k. The standard representation of (s1; : : : ; sm) is simply (I; <=; 'I) where I = f0; : : : ; mg and 'I(i; j) = si+1 ? ? ? sj for i; j 2 I; i <= j. Hence for the standard representation the bijection ? is the identity. In the following any representation (I; <=; 'I) of some S-sequence is called a linear order over S. Remark 4.2 An S-sequence can be viewed as an abstraction of a linear order over S. In most cases we are interested in the abstract objects only, but if we work with them we have to pass to concrete representations. When counting linear orders over S (c.f. Lem. 4.9 below), by convention, we count only standard representations and mappings between them. Let w = a1 ? ? ? am 2 A?; ai 2 A for 1 <= i <= m. The set f0; : : : ; mg is the set of positions of w, and for 0 <= i <= j <= m let w(i; j) denote the factor ai+1 ? ? ? aj. The associated S-sequence of w is defined by wS = ('(a1); : : : ; '(am)). The notation wS refers also to its standard representation wS = (f0; : : : ; mg;<=;'w). The mapping 'w is defined by 'w(i; j) = '(w(i; j)) for all 0 <= i <= j <= m. Definition 4.3 Let s; s0 be S-sequences given by some representations (I; <=; 'I ) and (I 0; <= ; 'I0). We say that s0 is a refinement of s (or that s matches s0), if there exists an order respecting injective mapping ? : I ! I 0 such that 'I(i; j) = 'I0(?(i); ?(j)) for all i; j 2 I; i <= j. We write either s <= s0 or, more precisely, s <=? s0 and (I; <=; 'I) <=? (I 0; <=; 'I0) in this case. Remark 4.4 Let s; s0 be S-sequences such that s <= s0. Then we may choose concrete representations and a refinement (I; <=; 'I) <=? (I 0; <=; '0 I) such that ? : I ! I 0 is an inclusion, i.e., I ? I 0 and 'I is the restriction of 'I0 to I. Definition 4.5 Let s be an S-sequence and (I; <=; 'I) some representation. A word w 2 A? is called model of s (of (I; <=; 'I) resp.), if the associated S-sequence wS is a refinement of s, i.e. , (I; <=; 'I ) <=? wS for some ?. If w is a model of s, then we write w j= s or w j= (I; <=; 'I ). By abuse of language, we make the following convention. As soon as we have chosen a word w as a model, we are free to view the set I as a subset of positions of w, i.e., ? becomes an inclusion and therefore 'I(i; j) = '(w(i; j)) for all i; j 2 I; i <= j. Lemma 4.6 Every S-sequence (s1; : : : ; sm) has a model w 2 A?. Proof . Since ' is surjective, there are non-empty words wi 2 A+ such that si = '(wi) for all 1 <= i <= m. Let w = w1 ? ? ? wm, then we have w j= (s1; : : : ; sm). 2 The lemma above will yield the positive termination step in Makanin's algorithm if there are no more variables. In the positive case we can eventually reconstruct some S-sequence such that some model w describes a solution of the word equation. 15 Let i; j 2 I; i <= j be positions in a linear order over S. Then [i; j] denotes the interval from i to j, this is a linear sub-order over S which is induced by the subset fk 2 I j i <= k <= jg. More generally, let T ? I be a subset, then we view (T ; <=; 'T ) as a linear suborder of (I; <=; 'I ). In the following min(T ) and max(T ) refer to the minimal respectively to the maximal element of a subset T of a linear order I. Definition 4.7 Let (I; <=; 'I ) be a representation of some S-sequence, T ? I a non-empty subset, and `?; r? 2 I be positions such that `? < r?. An admissible extension of (I; <=; 'I) by T at [`?; r?] is given by a linear order (I?; <=; 'I?) and two refinements (I; <=; 'I) <=? (I?; <=; 'I?) and (T ; <=; 'T ) <=?? (I?; <=; 'I?) such that the following two conditions are satisfied: 1. I? = ?(I) [ ??(T ), 2. min(??(T )) = `? and max(??(T )) = r?. The intuition behind the last definition it should be rather clear. An admissible extension refines (I; <=; 'I ) by defining new positions between `? and r? until T matches the enlarged interval [`?; r?] in such a way that all new points have a corresponding point in T and such that min(T ) is mapped to `? and max(T ) is mapped to r?. The other way round: Let (I?; <=; 'I?) denote an admissible extension of (I; <=; 'I) by T at [`?; r?], then we may view I ? I?, whence T ? I?. There is a subset T ? ? I? representing the same S-sequence as T ; and we have I? = I [ T ?, min(T ?) = `?, and max(T ?) = r?. Example 4.8 Let (s1; : : : ; s6) be some S-sequence, (I; <=; 'I) its standard representation, `? = 4 and r? = 6. Let (I?; <=; 'I?) represent an admissible extension of (I; <=; 'I ) by f0; 3; 4; 5g at [4; 6]. Then we may assume I? = f0; : : : ; 6g [ f3?; 4?g with 0 < 1 < 2 < 3 < 4 < 5 < 6 and 4 < 3? < 4? < 6. We may or may not have 5 2 f3?; 4?g. Say we have 5 = 3?. Then the corresponding S-sequence has the form (s1; s2; s3; s4; s5; s4; s5) such that s5 = s1s2s3 and s6 = s4s5. Lemma 4.9 Given (I; <=; 'I ); T ? I; `?; r? 2 I. Then the list of all admissible extensions of (I; <=; 'I) by T at [`?; r?] is finite and effectively computable. Proof . Trivial, since the cardinality of an admissible extension is bounded by jIj + jT j. 2 4.2 From word equations to boundary equations Let x1 ? ? ? xg = xg+1 ? ? ? xd ; 1 <= g < d, xi 2 ? for 1 <= i <= d be a word equation with regular constraints Lx ? A? for all x 2 ?. Recall we are only interested in non-singular solutions and that we fixed a homomorphism ' : A+ ! S to a finite semigroup S such that '?1'(Lx) = Lx for x 2 ?. Hence without restriction it holds 1 62 Lx 6= ; for all x 2 ?. Since the images '(Lx) ? S are finite sets we can split into finitely many cases where in each case '(Lx) is a singleton. Thus, it is enough to consider a situation where the input is 16 x1 ? ? ? xg = xg+1 ? ? ? xd; 1 <= g < d and the question is the existence of a non-singular solution ö : ? ! A+ satisfying = ' ffi ö for some fixed mapping : ? ! S. The question will be reformulated in terms of boundary equations. A system of boundary equations is defined as follows. Definition 4.10 Let n >= 0 and ' : A+ ! S be a homomorphism to a finite semigroup S. 1. A system of boundary equations is specified by a tuple B = ((?;?); (I; <=; 'I ); left; B) where ? is a set of 2n variables, ? : ? ! ? is an involution without fixed points, i.e., x = x, x 6= x, for all x 2 ?, the triple (I; <=; 'I) is a linear order over S, left : ? ! I is a mapping, and B is a set of boundary equations. Every boundary equation b 2 B has the form b = (x; i; x; j) with x 2 ?, i; j 2 I such that left(x) <= i and left(x) <= j. 2. A solution of B is a model w j= (I; <=; 'I), w 2 A?, such that w(left(x); i) = w(left(x); j) for all (x; i; x; j) 2 B: (Recall that if a word w 2 A? is a model for (I; <=; 'I), then we view I as a subset of positions of w. Hence it makes sense to write w(p; q) for p; q 2 I; p <= q.) 3. If B is solvable, then the exponent of periodicity exp(B) of B is defined by exp(B) = minfexp(w) j w is a solution of Bg: Remark 4.11 If n = 0, then there are no variables, hence no boundary equations, and any model w j= (I; <=; 'I) is a solution is a solution of B. In particular, if n = 0, then the system is solvable by Lem. 4.6. Consider a word equation x1 ? ? ? xg = xg+1 ? ? ? xd and a mapping : ? ! S. We are going to construct a system B = ((?;?); (I; <=; 'I ); left; B) of boundary equations having the following two properties. 1.) Let ö : ? ! A+ be a solution of the word equation such that = ' ffi ö, and let v 2 A? be a word with v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). Then w = vv is a solution of B. 2.) Let w j= (I; <=; 'I) be a solution of B. Then we have w 2 A?vvA? for some v 2 A? and there is a solution of the word equation ö : ? ! A+ such that = ' ffi ö and v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). In order to define B we start with the S-sequence ( (x1); : : : ; (xd)): Let (I; <=; 'I) be some representation, I = fi0; : : : ; idg, i0 <= ? ? ? <= id. The next step is to define the pair (?;?) and the mapping left : ? ! I. To this purpose we introduce an undirected 17 graph. Let (V; E) be the undirected graph with vertex set V = f1; : : : ; dg and edge set E = f(p; q) 2 V ?V j xp = xqg. The idea is that for v = ö(x1; : : : ; xg) = ö(xg+1; : : : ; xd) and w = vv we can realize I as a subset of positions of w such that both w j= ( (x1); : : : ; (xd)) and the following equations hold: w(i0; ig) = w(ig; id); w(ip?1; ip) = w(iq?1; iq) for all (p; q) 2 E: For the first equation we shall introduce an extra variable x0 (and its dual x0) below; in the other list of equations there is some redundancy. For (p; q); (q; r) 2 E, we have by definition (p; r) 2 E, but the equations w(ip?1; ip) = w(iq?1; iq) and w(iq?1; iq) = w(ir?1; ir) already imply w(ip?1; ip) = w(ir?1; ir). Hence we don't need the edge (p; r) for the equation. To avoid this redundancy we let F ? E be a spanning forest of (V; E). This means F = F?1, F ? = E?, and (V; F ) is an acyclic undirected graph. We have jF j = 2(d ? c), where c is the number of connected components of (V; E). The elements of F are called variables and for each x = (p; q) 2 F we define its dual and two positions left(x); right(x): x = (q; p); left(x) = ip?1; right(x) = ip: Note that x 6= x and x = x for all x 2 F . Taking duals corresponds to edge reversing in (V; F ). Define two new extra variables x0 and x0 with x0 = x0 and define ? = fx0; x0g [ F and: left(x0) = i0; right(x0) = ig = left(x0); right(x0) = id: This defines the set ?, the involution without fixed points ? : ? ! ?, and the mapping left : ? ! I. The last step of the construction is to define the set B of boundary equations. It should be clear what to do. We define B = f(x; right(x); x; right(x)) j x 2 ?g: We have to verify two properties. 1. Let ö : ? ! A+ be a solution such that = ' ffi ö, and let w = vv, where v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). The word w has positions 0 = i0 < i1 < ? ? ? < id, where id is the last position and the following equations hold: w(i0; ig) = w(ig; id); w(ip?1; ip) = ö(xp) for 1 <= p <= d: In particular, w j= (I; <=; 'I) and w is a solution of B. 2. Let w j= (I; <=; 'I) be a solution of B. Without restriction we may view I as a subset of positions of w. Consider the factors w(i0; ig) and w(ig; id). The boundary equation (x0; right(x0); x0; right(x0)) 2 B implies w(i0; ig) = w(ig ; id) and it follows that w 2 A?vvA? for v = w(i0; ig). We define ö : ? ! A+ by ö(xp) = w(ip?1; ip). Since ip?1 < ip, this is a non-empty word. The elements (x; right(x); x; right(x)) 2 B for x = (p; q), x = (q; p), (p; q) 2 T imply w(ip?1; ip) = w(iq?1; iq) whenever xp = xq. Hence ö is well-defined. We have 'ö(xp) = 'w(ip?1; ip) = (xp) since w j= (I; <=; 'I ). Finally, v = w(i0; ig) = w(ig; id) implies v = ö(x1 ? ? ? xg) = ö(xg+1 ? ? ? xd). 18 Thus, the word equation with regular constraints given by the mapping has a solution if and only if the system of boundary equations is solvable. The construction of the system B above can be performed in polynomial time; more precisely, the construction yields a logspace-reduction. Due to this reduction, Makanin's result follows from the next theorem. The assertion of Thm. 4.12 is in fact equivalent to Makanin's result, see Lem. 4.14 below. Theorem 4.12 It is decidable whether a system of boundary equations has a solution. The rest of this chapter is devoted to the proof of Thm. 4.12. An important property is stated in the next proposition: We can bound the exponent of periodicity while searching for a solution. Note however that bounding the exponent of periodicity of some word gives absolutely no bound on the length of this word. Proposition 4.13 Given as instance a system of boundary equations B, we can compute a number e(B) having the property that if B is solvable, then we have exp(B) <= e(B). The proof of Prop. 4.13 could be based on the same techniques as presented in Sect. 3. However, for our purposes we prefer to prove Prop. 4.13 via a reduction to word equations. Lemma 4.14 There is an effective reduction of the solvability of a system of boundary equations B to some word equation with regular constraints. Moreover, there is a reduction such that if w 2 A? is a solution of the word equation, then B is solvable and we have exp(B) <= exp(w): Proof . Let B = ((?;?); (I; <=; 'I ); left; B) be a system of boundary equations. We may assume that the linear order (I; <=; 'I ) is the standard representation of its underlying S-sequence s = (s1; : : : ; sm). Introduce new variables y1; : : : ; ym with regular constraints (yp) = sp, 1 <= p <= m. For each boundary equation b = (x; i; x; j) 2 B we introduce a word equation yleft(x)+1 ? ? ? yi = yleft(x)+1 ? ? ? yj: This system of word equations with regular constraints is solvable if and only if B is solvable. Indeed, if w 2 A? is a solution of B, then, by definition, we have (I; <=; 'I) <=? wS , and ?(I) is a subset of positions of w. All word equations w(?(left(x)); ?(i)) = w(?(left(x)); ?(j)) are satisfied for (x; i; x; j) 2 B. Hence defining ö(yp) = w(?(p ? 1); ?(p)); 1 <= p <= m yields a solution of the system of word equations. For the other direction let ö(yp) = vp; 1 <= p <= m be some solution of the system of word equations. Due to the regular constraints we have (yp) = sp and vp 6= 1 for all 1 <= p <= m. Therefore the word v = ö(y1) ? ? ? ö(ym) solves B. Next, we transform the system of word equations into a single word equation L = R using Prop. 2.9 and finally we reduce to the word equation Ly1 ? ? ? ym = Ry1 ? ? ? ym. The point is that if w is a solution of this equation, then some suffix v of w solves B. Hence exp(B) <= exp(v) <= exp(w): This yields Lem. 4.14. Now, let d be the denotational length of Ly1 ? ? ? ym = Ry1 ? ? ? ym. Then define the number e(B) = e(c(S); d), which has been given in Thm. 3.2. We can choose w such that exp(w) <= e(c(S); d). This proves Prop. 4.13. 2 19 4.3 The convex chain condition Let B = ((?;?); (I; <=; 'I ); left; B) be a system of boundary equations. A boundary equation b = (x; i; x; j) 2 B is also called a brick henceforth. The variable x is called the label of the brick b = (x; i; x; j). Pictorially a brick is given as follows: x i x j The dual brick b of b = (x; i; x; j) is given by reversing the brick, it has the label x: x j x i Henceforth, we make the assumption that B is closed under duals (i.e., b 2 B implies b 2 B) and that there is at least one brick b 2 B having label x for all x 2 ?. Clearly, this is no restriction. For x 2 ? let B(x) ? B be the subset of bricks with label x. Then B(x) = f(x; i1; x; j1); : : : ; (x; ir ; x; jr)g for some non-empty subset fi1; : : : ; irg ? I such that left(x) <= i1 <= ? ? ? <= ir. The right boundary of x is defined by by right(x) = ir. Before we continue, we make some additional assumptions on B. All of them are necessary conditions for solvability and easily verified. Let (x; i; x; j); (y; i; y; j); (y; i0; y; j0) 2 B. Then we assume from now on: - left(x) <= left(x) if and only if i <= j, - 'I (left(x); i) = 'I(left(x); j); - left(x) <= left(y) if and only if left(x) <= left(y), - i <= i0 if and only if j <= j0. These assumptions imply that if B(x) = f(x; i1; x; j1); : : : ; (x; ir ; x; jr)g is given such that left(x) <= i1 <= ? ? ? <= ir, then we also have left(x) <= j1 <= ? ? ? <= jr. In particular, B(x) contains a brick (x; right(x); x; right(x)). The set B(x) can be depicted as follows: B(x) = ( x i1 x j1 ; x i2 x j2 ; : : : ; x right(x) x right(x) ) In our pictures a brick (x; i; x; j) can be placed upon (y; j0; y; k), if and only if j = j0. We obtain one of out of three different shapes: x i x j y j y k x i x j y j y k x i x j y j y k Which one of these cases occurs is determined by the function left : ? ! I. The leftmost picture corresponds to left(x) < left(y), the picture in the middle corresponds to left(x) = left(y), the picture on the right means left(x) > left(y). 20 Definition 4.15 Let m >= 1. A chain C of length m is a sequence of bricks C = ((x1; i1; x1; i2); (x2; i2; x2; i3); : : : ; (xm; im; xm; im+1)); where (xp; ip; xp; ip+1) 2 B for all 1 <= p <= m. A chain C is called convex, if for some index q with 1 <= q <= m we have: left(xp) >= left(xp+1) for 1 <= p < q; left(xp) <= left(xp+1) for q <= p < m: A convex chain C is called clean, if the bricks of C are pairwise distinct. A brick (x; i; x; j) is linked via a convex chain of length m to the brick (x0; i0; x0; j0), if there is a convex chain C of length m as above such that m >= 1; (x; i; x; j) = (x1; i1; x1; i2), and (x0; i0; x0; j0) = (xm; im; xm; im+1): Remark 4.16 If C = (b1; : : : ; bm) is a convex chain, then its dual C = (bm; : : : ; b1) and (bp; : : : ; bq), 1 <= p <= q <= m are convex chains. If bp = (xp; ip; xp; ip) for some 1 < p < m, then (b1; : : : ; bp?1; bp+1; : : : bm) is a convex chain. If bp = bq for some 1 < p < q <= m, then (b1; : : : ; bp?1; bq; : : : bm) is also a convex chain. In particular, if two bricks are linked via a convex chain, then they are linked via some clean convex chain. The shortest chain linking two bricks to each other is certainly clean. A typical picture of a convex chain is depicted in the following figure. x1 i1 x1 i2 x2 i2 x2 i3 x3 i3 x3 i4 x4 i4 x4 i5 ... xq?1 iq?1 xq?1 iq xq iq xq iq+1 xq+1 iq+1 xq+1 iq+2 ... xm im xm im+1 Definition 4.17 Let F ? I be a subset. A brick (x; i; x; j) 2 B is called a basis or foundation with respect to F , if j 2 F . We say that B satisfies the convex chain condition (with respect to F ), if every brick b 2 B can be linked via some convex chain to some basis. The set F is also called the set of final indices. 21 Note that if in the figure above we have fi2; : : : ; im+1g\F 6= ;, then the brick (x1; i1; x1; i2) is linked via a convex chain with a basis. Lemma 4.18 Let n; m; f 2 N and B = ((?;?); (I; <=; 'I ); left; B) be a system of boundary equations with j?j = 2n. Let F ? I have size f . Suppose that every brick b 2 B can be linked via a convex chain of length at most m to a basis with respect to F . Then we can bound the size of B by jBj <= (2n)m ? 2f: Proof . Consider a convex chain of length k, k <= m, where the last brick is a basis: C = ((x1; i1; x1; i2); (x2; i2; x2; i3); : : : ; (xk; ik; xk; ik+1)) Given any pair x 2 ?; j 2 I, there exists at most one brick (x; i; x; j) 2 B. Therefore the whole chain is uniquely defined by the sequence (x1; x2; : : : ; xk; ik+1) 2 ?k ? F . The number of these chains is bounded by j?jk ? jF j. Finally, observe that every brick b 2 B can be linked via a convex chain either of length m ? 1 or of length m to some basis. Indeed, if (b; b2; : : : ; bk?1; bk) is a convex chain of length k, then (b; b2; : : : ; bk?1; bk; bk; bk) is a convex chain of length k + 2. Therefore we obtain jBj <= j?jm?1 ? jF j + j?jm ? jF j <= (2n)m ? 2f: 2 Remark 4.19 Every system of boundary equations B satisfies the convex chain condition with respect to the set I, trivially. Furthermore, if we construct B by starting from a word equation x1 ? ? ? xg = xg+1 ? ? ? xd, 1 <= g < d, then we have jIj <= d. The transformation rules below will neither increase the number 2n of variables nor the sum 2n+f . It will increase the sizes of I and of B. However, Lem. 4.18 says that a large number of boundary equations (i.e., a large set of bricks) yields that there are long convex chains in order to satisfy the convex chain condition (pictorially: many bricks build skyscrapers). The next step is to show that long convex chains lead to high domino towers (pictorially: skyscrapers hide high towers) and hence to a lower bound on the exponent of periodicity in any solution. Proposition 4.20 Let n; m 2 N and B = ((?;?); (I; <=; 'I ); left; B) be a solvable system of boundary equations with j?j = 2n. Let w j= (I; <=; 'I) be a solution of B. Suppose that B contains a clean convex chain of length at least m. Then we have the following lower bound for the exponent of periodicity of the solution w: m <= 4n2 ? (exp(w) + 1) ? 1 Proof . The hypothesis of the proposition implies n 6= 0, hence w 6= 1. Then the assertion becomes trivial for m < 8n2. Hence let n >= 1 and ?m+1 4n2 ? >= 2: Since w is a solution we may assume that I is a subset of positions of w and it holds that 'I(`; r) = '(w(`; r)) for all `; r 2 I, ` <= r. For all x 2 ? define a word w(x) 2 A? by w(x) = w(left(x); right(x)): 22 This permits also a notion of w-length for x 2 ?. We define jxjw = jw(x)j: Note that w(x) = w(x) and hence jxjw = jxjw for all x 2 ?. Let C = (b1; : : : ; bm) be a clean convex chain of length m, where bp = (xip ; ip; xip ; ip+1) for all 1 <= p <= m. Define m0 = ?m+1 2 ? , then by duality (replacing C by C) we may assume: left(x1) >= left(x2); left(x2) >= left(x3); : : : ; left(xm0?1) >= left(xm0): The upper part of the chain C up to m0 might look like in in the following figure, where e.g. m = 11. x1 i1 x1 i2 x2 i2 x2 i3 x3 i3 x3 i4 x4 i4 x4 i5 x5 i5 x5 i6 x6 i6 x6 i7 In the following we need a long chain where the label of the last brick has minimal wlength. In order to find such a chain we scan (b1; : : : ; bm0) from right to left. We find a sequence of indices 0 = p0 < p1 < ? ? ? < pk?1 < pk = m0 such that k <= n and for all q; j where pj?1 < q <= pj; 1 <= j <= k we have: jxqjw >= jxpj jw: This means that in each interval [pj?1 + 1; pj] the last label xpj has minimal w-length. By the pigeon hole principle there is at least one index j 2 f1; : : : ; kg such that pj ? pj?1 >= m+ 1 2n : We conclude that (after renaming) there is a clean convex chain C = (b1; : : : ; b`) satisfying the following properties: ` = ?m+1 2n ? ; left(xp) >= left(xp+1) for 1 <= p < `; jxpjw >= jx`jw for 1 <= p <= `: 23 Next define h (which will become the hight of a domino tower) by h = ?m+1 4n2 ? . Then it holds h >= 2 and 2n(h ? 1) + 1 <= ? m+ 1 2n ß : Hence there is some index p; 1 <= p < ` such that the label xp occurs at least h times in the the clean convex chain C. We may assume that this is the first label x1 and still ` >= h. Hence, there is a clean convex chain C = (b1; : : : ; b`), which satisfies the following properties: ` >= h; left(xp) >= left(xp+1) for 1 <= p < `; jxpjw >= jx`jw for 1 <= p <= `; the label x1 occurs exactly h times. This is the point where we switch from the chain to the sequence of words: (w(x1); : : : ; w(x`)): We obtain a tower of words where w(x`) has minimal length and the word w(x1) occurs at least h times. w(x6) w(x5) w(x4) w(x3) w(x2) w(x1) Define vp 2 A? to be the prefix of w(xp) of length jw(x`)j and let up = w(left(xp); ip) for 1 <= p <= `. Since jupj <= jw(left(x`); i`)j <= jv`j = jvpj, the word up is a prefix of vp for all 1 <= p <= `. The sequence (v1; : : : ; v`) can be arranged in a tower of words which is already in better shape: All words vp have equal length. v6 v5 v4 v3 v2 v1 The vertical line corresponds to the factorization vp = upu0 p for 1 <= p <= `. Finally, let fq1; q2; : : : ; qhg be a set of the h indices where the bricks have label x1. Since the convex chain leading to this tower is clean, we see that uqj 6= uqk for all 1 <= j; k <= h; j 6= k. (This is the only point where it is used that the chain is clean!) We obtain: 0 <= juq1 j < juq2 j < ? ? ? < juqh j: 24 Moreover, we have v1 = vq1 = vq2 = ? ? ? = vqh . We omit all other words in the tower above and we see that the word v1 can be arranged in a domino tower of height h and h >= 2. Applying Lem. 2.5 we obtain h ? 1 <= exp(w1) <= exp(w). Since ?m+1 4n2 ? <= h, the assertion of the proposition follows. 2 Corollary 4.21 Let B = ((?;?); (I; <=; 'I ); left; B) denote a solvable system of boundary equations which satisfies the convex chain condition with respect to some subset F ? I. Then we have jBj <= j?jj?j2(exp(B)+1)?1 ? 2jF j: If moreover j?j; jF j 2 O(d), and exp(B) 2 2O(d+log c(S)), then we have jBj 2 22O(d+log c(S)) : Proof . Let 2n = j?j, f = jF j, and m be the maximal length of a clean convex chain in B. By Lem. 4.18 and Rem 4.16 we have jBj <= (2n)m ? 2f: Choose a solution w such that exp(w) <= exp(B). Prop. 4.20 yields a lower bound for the exponent of periodicity for all solutions. Hence: m <= 4n2 ? (exp(w) + 1) ? 1: Putting things together we obtain: jBj <= (2n)4n2?(exp(w)+1)?1 ? 2f <= (2n)4n2?(exp(B)+1)?1 ? 2f: The result follows. 2 4.4 Transformation rules We are ready to define the (non-deterministic) transformation rules of Makanin's algorithm. If we apply a rule to a system B = ((?;?); (I; <=; 'I ); left; B), then the new system is denoted by B0 = ((?0;?); (I 0 ; <=; 'I0); left0; B0). The transformation rules below will have the property that if B = ((?;?); (I; <=; 'I ); left; B) satisfies the convex chain condition with respect to some subset F ? I, then B0 satisfies the convex chain condition with respect to some subset F 0 ? I 0 such that j?0j + jF 0j <= j?j + jF j. Thus, if we start with a system B0 where j?0j = 2n0 and jI0j <= d, then throughout the whole procedure the size of the set of final indices is smaller than or equal to 2n0 + d. We say that a (non-deterministic) rule is downward correct, if w 2 A? is a solution of B, then (for at least one non-deterministic choice) some suffix w0 of w is a solution of B0, and moreover either j?0j < j?j or jw0j < jwj. Thus, applied to solvable systems at least one sequence of choices of downward correct rules leads to termination. We say that a (non-deterministic) rule is upward correct, if w0 2 A? is a solution of B0 (and B0 is the result of any non-deterministic choice), then there is word w 2 A? which is a solution of B. 25 Rule 1 If there is some x 2 ? with left(x) = right(x), then cancel both bricks (x; right(x); x; right(x)) and (x; right(x); x; right(x)) from B. Cancel x and x from ?. Remark 4.22 Obviously Rule 1 is upward and downward correct since we have w(i; i) = 1 for all words w and all positions i of w. Hence the set of solutions is the same. In order to preserve the convex chain condition we introduce two new final indices. Let x 2 ? such that left(x) = right(x) and assume that x; x are canceled by Rule 1. Define F 0 = F [ fleft(x); left(x)g. Consider a convex chain C = (b1; : : : ; bm) where for some 1 < p <= m the brick bp has the form bp = (x; right(x); x; right(x)). Hence the brick bp is canceled. However, the brick b1 is linked to bp?1 via a convex chain and bp?1 is now a basis since right(x) = left(x) 2 F 0. Thus, if B satisfies the convex chain condition with respect to F , then the system B0 (after an application of Rule 1) satisfies the convex chain condition with respect to F 0. We have j?0j + jF 0j <= j?j + jF j. Rule 2 If there exists some x 2 ? with left(x) = left(x), then cancel all bricks (x; j; x; j) and (x; j; x; j) from B. Cancel x and x from ?. Remark 4.23 Recall that for (x; i; x; j) 2 B we have left(x) = left(x) if and only if i = j. Thus, if left(x) = left(x), then all bricks with label x have the form (x; j; x; j). Again, Rule 2 is obviously upward and downward correct. For the convex chain condition consider a convex chain C = (b1; : : : ; bm) where bp = (x; j; x; j) for some 1 < p <= m. If we have p < m, then C 0 = (b1; : : : ; bp?1; bp+1; : : : ; bm) is a shorter convex chain linking b1 with a basis. For p = m we have j 2 F . Hence bm?1 is also a basis. Rule 3 Let ` = min(I). If ` 62 left(?), then cancel the index ` from I. This means we replace the linear order over S by the induced sub-order (I 0; <=; 'I0) where I 0 = I n f`g. Remark 4.24 Clearly, the convex chain condition is not affected by this rule. Downward correctness is obvious, too. To see the upward correctness let (I; <=; 'I ) be given by the S- sequence (s1; : : : ; sm) and let w0 2 A? be a solution of the new system after an application of Rule 3 such that min(I 0) is the first position of w0. By definition of an S-sequence there is a non-empty word u 2 A+ with '(u) = s1. Then the first position of w0 is not equal to the first position in the word uw0, and uw0 is a solution of B. For later use notice that we can choose u such that juj <= jSj. The next rule is very complex. It is the heart of the algorithm. Before we apply it to some system B = ((?;?); (I; <=; 'I ); left; B), we apply Rules 1, 2 or 3 as often as possible. In particular, we shall assume that left(x) < right(x), left(x) 6= left(x) for all x 2 ?, and that there exists some x 2 ? with left(x) = min(I). Rule 4 We divide Rule 4 into six steps. We need some notation. Define ` = min(I) and r = maxfright(x) j x 2 ?; left(x) = `g: Note that ` 2 left(?), hence r 2 I exists and we have ` < r. Choose (and fix) some xo 2 ? 26 with left(xo) = ` and right(xo) = r. Define `? = left(xo) and r? = right(xo). Define the critical boundary c 2 I by c = minfc0; rg where c0 = minfleft(x) j x 2 ?; r < right(x)g: Note that since r < r? = right(xo), the minimum c0 and hence the critical boundary c exists. We have ` < c <= r < r? and c <= `? < r?, but the ordering of r and `? depends on the system. Define the subset T ? I of transport positions by T = fi 2 I j i <= cg [ fi 2 I j 9(x; i; x; j) 2 B : left(x) < cg Note that min(T ) = ` and that i 2 T for all (xo; i; xo; j) 2 B. Moreover, since left(x) < c implies right(x) <= r, we have max(T ) = r. Step 1 Choose some admissible extension (I?; <=; 'I?) of (I; <=; 'I ) by T at [`?; r?]. By convention we identify I as a subset of I?, whence I ? I?, and there is a subset T ? ? I? with min(T ?) = `?, max(T ?) = r?, and which is in order respecting bijection with T . For each i 2 T the corresponding position in T ? is denoted i?. Having these notations we put a further restriction on the admissible extension: We consider only those admissible extensions where first, i < i? for all i 2 T and second: left(x)? <= left(x) , i? <= j; left(x)? >= left(x) , i? >= j for all (x; i; x; j) 2 B with left(x) < c. Note that for all (xo; i; xo; j) 2 B this implies i? = j. If such an admissible extension is not possible, then Step 1 cannot be completed and Rule 4 is not applicable. Step 2 Introduce new variables x? and x? and define left(x?) = c, left(x?) = c?. For all i 2 T such that there is some (x; i; x; j) 2 B with left(x) < c <= i introduce new bricks (x? ; i; x? ; i?) and (x? ; i?; x? ; i). Step 3 As long as there is a variable x 2 ? with left(x) < c, replace left(x) by left0(x) = left(x)? and replace all bricks (x; i; x; j); (x; j; x; i) 2 B by (x; i?; x; j) and (x; j; x; i?). Remark 4.25 To have some notation let x denote a variable before Step 3 and let x0 be the corresponding variable after Step 3. Likewise let b = (x; i; x; j) denote a brick before Step 3 and let b0 = (x0; i0; x0; j) be the corresponding brick after Step 3. If left(x) = left0(x0), then sometimes we may still write x = x0. In particular, x? = x0 ?, x? = x? 0, xo = xo0, but xo 6= xo0. For b = (x; i; x; j) and b0 = (x0; i0; x0; j0) there are four cases: b0 = (x0; i?; x0; j?) if left(x) < c; left(x) < c; b0 = (x0; i?; x; j) if left(x) < c; c <= left(x); b0 = (x; i; x0; j?) if c <= left(x); left(x) < c; b0 = (x; i; x; j) if c <= left(x); c <= left(x): Note that after Step 3 all bricks (xo; i; xo; j) 2 B have the form (x0 o; i?; xo; i?). 27 Step 4 Define as the new set of final indices F 0 = fi? 2 I? j i < c and i 2 Fg [ fi 2 F j c <= ig: Step 5 Cancel all bricks with label x0 o or xo, i.e., cancel all bricks of the form (x0 o; i?; xo; i?) or (xo0; i?; x0 o; i?). Then cancel the variables xo, xo. Step 6 Replace I? by I 0 = fi 2 I? j c <= ig and consider the linear order (I 0; <=; 'I0) induced by I 0 ? I?. After Step 6 the transformation rule is finished. The new system is denoted by B0 = ((?0;?); (I 0; <=; 'I0); left0; B0). We will show from Lem. 4.30 to 4.33 below that B0 satisfies the convex chain condition with respect to F 0. The first lemma is a trivial observation. Lemma 4.26 We have j?0j = j?j and jF 0j <= jF j. Proof . In Step 2 new variables x? and x? are introduced, but in Step 5 the variables x0 0 and x0 are canceled. Hence j?0j = j?j. The set of final indices is changed in Step 4. However, the assertion jF 0j <= jF j is clear by the definition of F 0. 2 The following lemma is crucial to bound the size of I during the transformation procedure. The lemma has a rather subtle proof. Lemma 4.27 Let fi0 = jf(x0; i0; x0; j0) 2 B0 j left0(x0) < i0gj and fi = jf(x; i; x; j) 2 B j left(x) < igj. Then we have 2jI 0j ? fi0 <= 2jIj ? fi: Proof . The inequality can be destroyed either by a new position i? 2 T ? n I or by the cancelation of bricks (x0 o; i?; xo; i?), (xo; i?; x0 o; i?) in Step 5, where `? < i?. (Recall the definition of fi and fi0 and that left(xo) = `, left0(x0 o) = `?.) The cancelation of these bricks involves again a position of type i? 2 T ?. Fortunately, if (x0 o; i?; xo; i?) is canceled, where `? < i?, then i? = j for some j 2 I n f`g. In particular, i? is not a new position and the two cases don't occur simultaneously. Therefore it is enough to find for each i? 2 T ? n f`?g either two new bricks which are introduced in Step 2 or one position which is canceled in Step 6. Then the total balance will be negative or zero. Let us consider the positions of type i? 2 T ? n f`?g one by one. If c? < i?, then by the definition of T and Step 2 there are two new bricks (x? ; i; x? ; i?), (x? ; i?; x? ; i) 2 B0 and we have left(x?) < i, left(x?) < i?. Next consider i? = c?. At least one position (namely `) is canceled in Step 6. Next let `? < i? < c?, i.e., ` < i < c. The position i is canceled in Step 6. Hence we have the assertion of the lemma. 2 Lemma 4.28 Rule 4 is downward correct. Proof . Let w 2 A? be a solution of B. Since w j= (I; <=; 'I ), we can view I as a subset of positions of w with ` = 0. Let w = vw0 where v = w(`; c). The word v is a non-empty prefix of w(`; r). The word w(`; r) is a prefix of w and at the same time another factor of w0; we 28 have w(`; r) = w(`?; r?) with ` < `? due to the brick (xo; r; xo; r?) 2 B. The set T is a subset of positions of w(`; r), hence we find a corresponding subset T ? of positions of w(`?; r?). The union I [ T ? leads to an admissible extension (I?; <=; 'I) such that first, i < i? for all i 2 T and second, w(j; k) = w(j?; k?) for all j; k 2 T ; j <= k. A careful but easy inspection of Rule 4 then shows that w0 j= (I 0; <=; 'I0) and w0 is a solution of B0. 2 Lemma 4.29 Rule 4 is upward correct. Proof . Let w0 2 A? be a solution of B0. Since w0 j= (I 0; <=; '0 I ), we can view I 0 as a subset of positions of w0 where c is the first position of w0. Define v = w0(l?; c?) and let w = vw0. Then we have w j= (I?; <=; 'I?) such that v = w(l; c) = w(l?; c?). With the help of the bricks (x? ; i; x? ; i?) we conclude that w(j; k) = w(j?; k?) for all j; k 2 T ; j <= k. Therefore we have w(left(x); i) = w(left(x); j) for all (x; i; x; j) 2 B. Since I ? I?, we have w j= (I; <=; 'I) and w is a solution of B. 2 Finally we show that Rule 4 preserves the convex condition. This is clear for Step 1, for the other steps we state lemmata. Lemma 4.30 Step 2 preserves the convex chain condition with respect to the set F . Proof . The new bricks in Step 2 have the form (x? ; i; x? ; i?) and (x? ; i?; x? ; i) for some (x; i; x; j) 2 B with left(x) < c = left(x?) <= i. Since (x; i; x; j) 2 B can be linked via a convex chain to some basis, it is enough to consider the following figure: x? i x? i? x? i? x? i x i x j 2 Lemma 4.31 Let C = (b1; : : : ; bm) be a convex chain before Step 3 linking b1 with bm. Then after Step 3 there is a convex chain C 0 linking b0 1 with b0 m. Proof . Let us have a local look at the convex chain: C = ( : : : ; (x; i; x; j); (y; j; y; k) : : : ): By symmetry we may assume that left(x) >= left(y). Pictorially this local part is then given by the following figure. ... x i x j y j y k ... 29 This is the situation before Step 3. After Step 3 let us denote the corresponding bricks by (x0; i0; x0; j0) and (y0; j00; y0; k0). This yields the following figure. ... x0 i0 x0 j0 ... y0 j00 y0 k0 ... The question is whether or not j0 = j00. If j0 = j? or j00 = j, then we have j0 = j00, and the chain is not broken. Hence we have to consider the case j0 = j and j00 = j?, only. This case is equivalent to left(y) < c <= left(x) <= j: With the help of the brick (x? ; j; x? ; j?), which was introduced in Step 2, we can repair the broken chain. We have left(x?) = c <= left(x); left0(y0) < c? = left(x?) and we obtain the following figure: ... x0 i0 x j x? j x? j? y0 j? y0 k0 ... Doing this transformation wherever necessary we construct the convex chain C 0. 2 Note that C 0 constructed in the lemma above may contain many bricks of the form (x0 o; i?; xo; i?) and (xo; i?; x0 o; i?): These bricks were canceled only later in Step 5. In fact their presence in the next lemma is very useful again. Lemma 4.32 After Step 4 the convex chain condition is satisfied with respect to the set F 0. Proof . Let b0 be a brick after Step 3 and b the corresponding brick before Step 3. This brick b is linked before Step 3 via a convex chain to some basis (x; i; x; j) with j 2 F . Lem. 4.31 states that after Step 3 the brick b0 is linked via a convex chain to the corresponding brick (x0; i0; x0; j0). For j < c we have left(x) < c and j0 = j? 2 F 0. Hence (x0; i0; x0; j?) is again a basis. For j0 = j we have c <= j and therefore j 2 F 0. This solves also the case j0 = j. The remaining case is c <= j and j0 = j?. This means left(x) < c <= j. By Step 2 there is a brick (x? ; j?; x? ; j) and we have left0(x0) < c? = left(x?). We may put the brick (x0; i0; x0; j?) upon the basis (x? ; j?; x? ; j). Since j 2 F \ F 0, it is in fact a basis before and after Step 4. We obtain the following figure: 30 x0 i0 x0 j? x? j? x? j 2 Lemma 4.33 Steps 5 and 6 preserve the convex chain condition with respect to the set F 0. Proof . Step 5 is a special case of an application of Rule 2, likewise Step 6 is a special case of applications of Rule 3. In particular, the convex chain condition is preserved. 2 The lemmata above yield to the following proposition: Proposition 4.34 Rule 4 is upward and downward correct. It preserves the convex chain condition. Example 4.35 Let x1 ? ? ? xg = xg+1 ? ? ? xd be a word equation, 1 <= g < d such that the regular constraints are given by a mapping : ? ! S. Let B = ((?;?); (I; <=; 'I ); left; B) be the result of the (logspace-) reduction presented in Sect. 4. Recall that (I; <=; 'I ) represents the S-sequence ( (x1); : : : ; (xg); (xg+1); : : : ; (xd)): We may assume that (I; <=; 'I) is in its standard representation, I = f0; : : : ; dg. According to the reduction the set ? contains two variables x0 and x0 such that left(x0) = 0, right(x0) = g = left(x0), and right(x0) = d. The set B contains at most d boundary equations (or bricks), among them there is the brick: x0 g x0 d We have jIj = d + 1 and j?j = jBj <= 2d. If the word equation has a non-singular solution satisfying the regular constraints, then exp(B) <= 2 ? e(c(S); d). Rules 1 to 3 are not applicable to B, but we can try Rule 4. Doing this we find: xo = x0; l = 0; c = g = r = l?; and c? = g? = r? = d: The set T of transport positions is T = f0; : : : ; gg. In Step 1 we have to choose some admissible extension of (I; <=; 'I) by T at [g; d]. In general it is not clear that such an extension exists. Under the hypothesis that x1 ? ? ? xg = xg+1 ? ? ? xd has a non-singular solution ö : ? ! A+ with ' ffi ö = we can continue. Let v = ö(x1 ? ? ? xg) and assume that v has minimal length among all solutions satisfying the regular constraints given by . With the help of this word Step 1 can be completed: Define w = vv, then we have w j= ( (x1); : : : ; (xd)): 31 The set of positions of w is f0; : : : ; m;m + 1; : : : ; 2mg where m = jvj. The fact that w is a model of (I; <=; 'I) is realized by an order respecting injective mapping ? : f0; : : : ; dg ! f0; : : : ; 2mg: Define T ? = fm+?(i) j 0 <= i <= gg and I? = ?(I) [T ?. Since I? is a subset of positions of w, this induces a linear suborder over S, which is denoted by (I?; <=; 'I?). We have jI?j <= d+g?1. After renaming we may assume I? = f0; : : : ; dg[T ?, T ? = f0?; : : : ; g?g such that 0? = c = g, c? = g? = d. This completes Step 1 of Rule 4. Since in reality we usually do not know v, the choice of I? is a non-deterministic guess! The next steps in Rule 4 are deterministic. In Step 2 we introduce new variables x? and x? with left(x?) = g = right(x?) and left(x?) = d = right(x?). In Step 3 we transport the structure of the interval [0; g] to [0?; g?] = [g; d]. If we still view I? as a subset of positions of w, then this reflects a transport to the positions from the first to the second factor v in the word w = vv. The definition of F 0 according to Step 4 is F 0 = fi 2 I? j g <= ig: In Step 5 we cancel the bricks (xo; d; xo; d), (xo; d; xo; d) and the variables xo, xo. In Step 6 we replace I? by I 0 = F 0. Rule 4 is finished. The cardinality of I 0 is bounded by d. Let B0 denote the new system, then the word v is a solution, v j= (I 0; <=; '(I 0). Since in the present situation left(x?) = right(x?) = g, Rule 1 is now applicable to B0, it cancels the superfluous bricks (x? ; g; x? ; d), (x? ; d; x? ; g) and the variables x? and x? . The new system after an application of Rule 1 is denoted by B00 = ((?00 0 ; ?); (I 00 0 ; <=; 'I000 ); left00 0; B00 0 ): We have jI 00j <= d; j?00j = jB00j <= 2(d ? 1). It is now the word v which is a solution of B00, hence exp(B00) <= exp(v). Therefore, we can choose e(B00) = e(c(S); d). 5 Proof of Theorem 4.12 5.1 Decidability The proof of Thm. 4.12 is a reduction to a reachability problem in some finite directed graph. The implications for space- and time bounds for Makanin's algorithm are given later. The instance is a system of boundary equations B0 = ((?0;?); (I0; <=; 'I0); left0; B0): We may assume that B0 satisfies the assumptions made at the beginning of Sect. 4.3, because otherwise B0 is not solvable. For trivial reasons the system B0 satisfies the convex chain condition with respect to the set F0 = I0: Let 2n0 = j?0j and f0 = jF0j = jI0j. According to Prop. 4.13 choose a number e(B0) such that either B0 is not solvable or exp(w) <= e(B0) for some solution w of B0. Define an integer fimax by fimax = (2n0)4n20(e(B0)+1)?1 ? 2(2n0 + f0): Note that this value is defined just to fit Cor. 4.21 for a set of final indices having size at most 2n0 + f0. 32 Now, define a directed graph G (the search graph of Makanin's algorithm) as follows. The nodes of the search graph G are the systems of boundary equations B = ((?;?); (I; <= ; 'I); left; B), where: j?j <= 2n0; jIj <= n0 + 2 2 ? fimax; jBj <= fimax: For systems B,B0 2 G we define an arc from B to B0 whenever first, there is a transformation rule is applicable to B and second, B0 is the result of the corresponding transformation. A system B 2 G with an empty set of variables is called a terminal node. Clearly, B0 2 G and the search graph G has only finitely many nodes. Hence, it is enough to show the following claim: The system B0 has a solution if and only if there is a directed path in G from B0 to some terminal node. The "if"-direction of the claim is trivial since all transformation rules are upward correct and since all terminal nodes are solvable by Lem. 4.6. For the "only-if"-direction let B0 be solvable and let w0 j= (I0; <=; 'I0) be a solution satisfying exp (w0) <= exp (B0). Let M >= 0 and assume that there is an inductively defined sequence of solvable systems (B0; B1; : : : ; BM ), M >= 0 such that the following properties are satisfied for all 1 <= k <= M : - Bk = ((?k;?); (Ik; <=; 'Ik); leftk; Bk) is the result of some transformation rule applied to Bk?1, - Bk has a solution wk j= (Ik; <=; 'Ik) such that wk is a suffix of wk?1, - either j?kj < j?k?1j or jwk?1j < jwkj, - Bk satisfies the convex chain condition with respect to some subset Fk ? Ik with jFkj <= 2n0 ? j?kj + f0. If BM is a system of boundary equations without variables, then we stop. Otherwise, since BM is solvable, a transformation rule is applicable. Consequently, the sequence can be continued by some solvable system BM+1 satisfying all properties above. The third property however implies that M <= n0 + jw0j. Hence, finally we must reach a system without variables. We may assume that this happens with reaching BM . Let us show that all Bk are nodes of G for all 0 <= k <= M . This will imply the claim since then there is a directed path to BM , and BM is a terminal node. We have to verify j?kj <= 2n0; jIkj <= n0+2 2 ? fimax; and jBkj <= fimax: The assertion j?kj <= 2n0 is trivial. The second property of the sequence implies exp (Bk) <= exp (wk) <= exp (w0) <= e(B0). By Cor. 4.21 and the fourth property we have jBkj <= fimax. The next lemma yields an invariant which will give the desired bound on the size of every Ik. Lemma 5.1 For 0 <= k <= M define fik = jf(x; i; x; j) 2 Bk j leftk(x) < igj. Then for all 1 <= k <= M we have: 2 jIkj ? fik + j?kj 2 ? fimax <= 2 jIk?1j ? fik?1 + j?k?1j 2 ? fimax: 33 Proof . Consider the rule which was applied to pass from Bk?1 to Bk. For Rule 1 or 2 we have: j?kj = j?k?1j ? 2; jIkj = jIk?1j ; fik?1 ? fik <= fimax: For Rule 3 we have: j?kj = j?k?1j ; jIkj = jIk?1j ? 1; jfikj = jfik?1j : Finally, for Rule 4 we have j?kj = j?k?1j and Lem. 4.27 says: 2 jIkj ? fik <= 2 jIk?1j ? fik?1: The assertion of the lemma follows. 2 A consequence of Lem. 5.1 (and fik <= fimax) is: 2 jIkj <= 2 jI0j + (n0 + 1)fimax for all 0 <= k <= M: Since jI0j <= 1 2fimax, we obtain jIkj <= n0+2 2 fimax. Hence Bk 2 G for all 0 <= k <= M . This proves Thm. 4.12, hence Makanin's result. 5.2 Complexity in terms of the semigroup S and the maximal number of boundary equations Our estimations on the upper bounds of Makanin's algorithm are given by the size of the semigroup S and the number fimax as defined in the precedent section, which is the maximal number of boundary equations. A node B = ((?;?); (I; <=; 'I ); left; B) of the search graph G is encoded as a binary string over f0; 1g as follows: The code for (?;?) is simply the number n written in binary such that j?j = 2n. Thus, O (log n0) bits are enough for this part. The linear order (I; <=; 'I ) is encoded by its underlying S-sequence. For this part O(n0fimax log jSj) bits are used. The mapping left : ? ! I is encoded by using O(n0 log (n0fimax)) bits. Finally, the set of bricks B can be encoded by using O(fimax log (n0fimax)) bits. Note that n0 <= log fimax. It follows that there is effectively a constant c 2 N such that every B 2 G can be described by a bit string of length equal to c ? (log jSj ? fimax ? log (fimax)). Up to some calculations performed over S this is the essential upper space bound for the non-deterministic procedure. It is at most double exponential in the input size, but we will come back to this point later. The number of bits we need for the code yields an upper bound for the size of G. Using the constant c above, define a natural number flmax by: flmax = 2c?(logjSj?fimax?log(fimax)) 2 2O(logjSj?fimax?log(fimax)): Lemma 5.2 The number of nodes in G is less than or equal to flmax: 34 Proof . The number of nodes is as most exponential in the number of bits used in a description for a node. 2 The following assertion is now clear: Proposition 5.3 The system B0 is solvable if and only if G contains a directed path (B0; : : : ; Bm) to the terminal node Bm = (;; ;; ;; ;) such that m <= flmax: Proof . If there is a path to some terminal node then this path can be elongated by applications of Rule 3 until finally the underlying linear order is the empty set. We may assume that this path is without cycles, then Lem. 5.2 implies m <= flmax: 2 Corollary 5.4 Let B0 be solvable and let w0 j= (I0; <=; 'I0) be a solution, where the length jw0j is minimal. Then we have jw0j <= jSj ? 2flmax : Proof . Consider a sequence (B0; : : : ; Bm), m <= flmax to the terminal node Bm = (;; ;; ;; ;) as in Prop. 5.3. Going the path backwards we define inductively solutions vm; vm?1; : : : ; v0 of the systems Bm;Bm?1; : : : ; B0 as follows. The initial solution is vm = 1. Assume that vm; : : : ; vk, 1 <= k <= m are already defined. Depending on the transformation rule which link Bk?1 to Bk we define the solution vk?1 j= ? Ik?1; <=; 'Ik?1 ? of the system Bk?1. For Rule 1 or 2 we define vk?1 = vk. For Rule 3 we define vk?1 = uvk for some suitable u 2 A+. It is clear that we can choose u such that juj <= jSj. Hence jvk?1j <= jSj + jvkj. For Rule 4 we find a solution vk?1 = uvk, where u is a factor of vk. Hence jvk?1j <= 2 jvkj. We end up with a solution v0 j= (I0; <=; 'I0) of B0 such that jv0j <= jSj ? 2flmax . Since w0 is of minimal length we have jw0j <= jv0j. 2 Remark 5.5 Assume that B0 is solvable and w0 is a solution of minimal length. The word w0 can be used as a model for constructing a path (B0; : : : ; BM ) ; M >= 1 to some terminal node. However, we cannot exclude that this path has many cycles. In particular, M need not to be the number m, which was used in the proof of Cor. 5.4. Due to the construction of the solution v0 above, it is possible that v0 6= w0 and e(B0) < exp(v0). Corollary 5.6 In deterministic time 2O(jSj?2flmax ) we find by exhaustive search either a solution of minimal length or we can report that B0 is not solvable. This upper bound is five times exponential in the input size (number of bits used in the encoding) of B0. Proof . Test all strings up to the length of jSj ? 2flmax whether they are a solution. Stop when the first solution is encountered. If no solution up to this length is found, then B0 is not solvable. 2 35 5.3 An upper bound for the complexity of solving word equations The original question of the chapter is whether a given word equation x1 ? ? ? xg = xg+1 ? ? ? xd, 1 <= g < d with regular constraints has a solution. We may assume that each regular language Lx ? A? is specified by an NFA with rx states, x 2 ?. Define r = P x2? rx; we are going to measure the complexity of Makanin's algorithm in terms of d and r. First, we choose a suitable semigroup S and a homomorphism ' : A+ ! S. By Rem. 3.1 we may assume that S satisfies jSj <= 2r2 and c (S) <= r!. By Thm. 3.2 choose a value e (c (S) ; d) 2 c(S) ? 2O(d) ? 2O(d+r log r) such that e (c (S) ; d) is an upper bound for the exponent of periodicity. Transform the word equation (by a non-deterministic guess) into a system of boundary equations B0 = ((?0;?); (I0; <=; 'I0); left0; B0). such that the word equation has a solution satisfying the regular constraints if and only if B0 is solvable. This is possible such that first, jI0j; j?0j; jB0j 2 O(d), and second, if B0 is solvable, then e (B0) <= 2 ? e (c (S) ; d) 2 2O(d+r log r): More precisely, by Ex. 4.35 we can say jI0j <= d ? 1; j?0j = jB0j <= 2(d ? 1) and, if B0 is solvable, then e (B0) <= e (c (S) ; d) : Compute a value fimax 2 22?(d+r log r) such that the search graph G satisfies Prop. 5.3 for the corresponding value flmax. Recall that fimax is an upper bound for the number of boundary equations of each node and that flmax an upper bound for the number of nodes in G. The number fimax is double exponential in the input size, which is for simplicity d + r. The value fimax is large enough to perform all computations over the semigroup S and it is small enough in order to solve the reachability problem in the search graph G in non-deterministic space NSPACE(22O(d+r log r) ). Using standard knowledge in complexity theory (like Savitch's Theorem, c.f. Hopcroft and Ullman (1979)), we can state: Theorem 5.7 The satisfiability problem for word equations with regular constraints is in the following complexity classes: DSPACE ? 22O(d+r log r)? , i.e., double exponential deterministic space, DTIME ? 222O(d+r log r) ? , i.e., triple exponential deterministic time. The length of a shortest solution is at most four times exponential in the input size, it can be bounded by 2222O(d+r log r) : The computation of a shortest solution by exhaustive search is possible in at most five times deterministic exponential time. Remark 5.8 The complexity bounds given above are slightly different from other bounds published in the literature so far. In Ko?scielski and Pacholski (1996: Cor. 4.6) a triple exponential non-deterministic time bound for the satisfiability problem is given. Here we have triple exponential deterministic time, since in the formulation as a graph reachability problem it is the 36 number of nodes which becomes important. The upper bound for the exponent of periodicity given in Schulz (1992a) for the situation including regular constraints is based on the techniques of the original article of Makanin. This yields a double exponential bound whereas it is shown here that one exponential is enough. The upper bound for the exponent of periodicity is essentially optimal; the optimality is not known for the bounds mentioned in Thm. 5.7 6 Notes A systematic study of equations in free monoids was initiated by A. A. Markov in the late 1950's in connection with Hilbert's Tenth Problem, see Hmelevski>=? (1971), Makanin (1981). It is not difficult to see that the matrices having non-negative integer coefficients and determinant 1 form a free monoid inside the special linear group SL2(Z). The free generators are: a = ? 1 1 0 1 ? ; b = ? 1 0 1 1 ? : Let L = R be a word equation over fa; bg in unknowns ? = fx1; : : : ; xng. Replace each variable xi 2 ? by a matrix ? ffi1 ffi2 ffi3 ffi4 ? ; where ffij denote variables over N. Multiplying matrices corresponding to the words L and R yields an equation of the form ? P1 P2 P3 P4 ? = ? Q1 Q2 Q3 Q4 ? : The coefficients P1; : : : ; Q4 are polynomials in the ffij . It is clear that the equation L = R has a solution if and only if the following Diophantine system has a non-negative solution: Pi = Qi; i = 1; : : : ; 4; ffi1ffi4 ? ffi2ffi3 = 1; i = 1; : : : ; n: The hope of Markov was to prove this way the unsolvability of Hilbert's Tenth Problem, which was not settled at that time. This failed: The unsolvability of Hilbert's Tenth Problem was shown in 1970 by Matiyasevich using quite different methods, see Matiyasevich (1993). The solvability of word equations is, needless to say at this place, due to Makanin (1977). Before Makanin obtained the breakthrough only partial results were known. In 1964 and 1967 Hmelevski>=? found a positive solution for the cases with two and three variables respectively, see Hmelevski>=? (1971). In the case of two variables a polynomial time algorithm for the satisfiability problem is given in Charatonik and Pacholski (1993). The solvability in the case where each variable occurs at most twice is due to Matiyasevich (1968). Other special cases were solved in Plotkin (1972) and Lentin (1972). After the general solution was established in 1977 other questions became central. In Makanin (1979) it is shown that the rank of an equation is computable, see also P?ecuchet (1981). Makanin's algorithm was implemented in 1987 at Rouen, see Abdulrab and P?ecuchet (1990). The inherent complexity of the satisfiability problem of word equations is not known. The lower bound is NP-hardness 37 for equations without regular constraints and PSPACE-hardness for equations with regular constraints. The known upper bounds to date for Makanin's algorithm are given in Thm. 5.7. There is a double exponential gap between lower and upper bound for the space complexity. The original article Makanin (1977) is very technical. In the sequel other presentations with various improvements were given, let us refer to Jaffar (1990), Schulz (1992a, 1993). The present chapter is along this line, it is rather close to Schulz (1992a). A brief survey on equations in words can be found in Perrin (1989); more material on equations in free monoids and, especially on equations without constants, is in the Handbook of Formal Languages, see Choffrut and Karhumäki (1997). There are also two volumes in the Springer lecture notes series dedicated to word equations and related topics: Schulz (1992b) and Abdulrab and P?ecuchet (1993). Equations in free groups are defined analogously to word equations. The situation however becomes extremely complicated. It was Makanin himself who mastered also this problem. In Makanin (1982) and with a correction in Makanin (1984) it is shown that the satisfiability of group equations with constants in decidable. In Razborov (1984) an algorithm is presented which generates all solutions to a given equation. The inherent complexity of Makanin's algorithm for groups is investigated in Ko?scielski and Pacholski (1998). The authors define the notion of abstract Makanin algorithm. They show that this abstract scheme is not primitive recursive. Another direction to extend Makanin's result is to include partial commutation: Let I ? ? ? ? be a relation between letters, which says when letters may commute (i.e., when they are independent). The quotient monoid M(?; I) = ??=fab = ba j (a; b) 2 Ig is called the free partially commutative monoid. It was introduced in Cartier and Foata (1969), where interesting combinatorial properties were discovered. In computer science free partially commutative monoids are usually called trace monoids, a notion which is due to Mazurkiewicz (1977). The interest is that partial commutation expresses some basic phenomena of concurrency, let us refer to Diekert and Rozenberg (1995) for an overview. Syntactically, a system of trace equations is the same as a system of words equations, but solutions are searched in the trace monoid, this means the commutation relations ab = ba can be used for free for all (a; b) 2 I. For example, if (a; b) 2 I, then, contrary to the situation in free monoids, the trace equation axb ? bya has a solution ö(x) = ö(y) = 1. The set of all solutions of this trace equation is given by ö(x) = ö(y) and alphabetic constraints. It is shown in Matiyasevich (1997) that the satisfiability of a system of trace equations is decidable. The proof is a reduction of trace equations to word equations with regular constraints. As a byproduct of the reduction we may put arbitrary recognizable constraints on the variables without loosing satisfiability. Another reduction using a new result on lexicographic normal forms of traces is presented in Diekert, Matiyasevich, and Muscholl (1997). A challenging open question to date is a generalization of Makanin's result to free partially commutative groups. But this is only one of many open questions in this area. The theory of word equations is still exciting and many problems remain to be solved. 38 Problems Section 1 1.1 Decide whether or not the solution abbababbaabab given to Ex. 1.1 is a non-singular solution of minimal length. 1.2 Show that the satisfiability problem of systems of word equations without regular constraints is NP-hard. Hint: Show that the problem is NP-complete, if there is exactly one constant, A = fag. Use the fact that linear integer programming is NP-hard, even in unary notation. 1.3 Modify the decision procedure, where each variable occurs at most twice, to include the case where we have regular constraints. Show that the underlying decision procedure is PSPACE complete, if the regular constraints are specified by a list of NFA (nondeterministic finite automata). Hint: The hardness follows directly from well-known PSPACE complete problems on regular sets. Section 2 2.1 Give a greedy algorithm to compute the p-stable normal form of a word w 2 A?: Modify the algorithm by pattern matching techniques such that it runs in linear time O(jwj + jpj): 2.2 Prove Props. 2.8, 2.9, and 2.10. Show that the results remain true when there are regular constraints. 2.3 Show that the satisfiability problem of single word equations without regular constraints is NP-hard. Hint: Compare this problem with Prob. 1.2. 2.4 Let Lx ? A? be a regular language. Describe the set of all solutions ö for an equation with only one unknown x under the constraint ö(x) 2 Lx. Section 3 3.1 An instance of a linear integer programming problem is given by an m ? n matrix D 2 Zm?n and a vector c 2 Zm. Let x 2 Nn be a minimal vector such that Dx = c. Assume that the sum over the squares over the coefficients in each row of D is in O(1) and kck 2 O(n2). Show by elementary methods that there is a (small) constant c such that kxk 2 O(2cn): Hint: The proof is a slight modification of the standard proof which shows that linear integer programming is NP-complete. Use Hadamard's Inequality for an upper bound for the maximal absolute value over the determinants of square submatrices of D. Next, show that if x 2 Nn is a minimal solution, then there is also a minimal solution x0 2 Nn such that first, the absolute value of at least one component can be bounded and second, Pn i=1 xi <= Pn i=1 x0 i. Freeze by an additional equation one variable of x0 39 to be a constant. Repeat the process until the homogeneous system Dx = c has only the trivial solution. Then apply Cramer's Rule. It should be noted that this method doesn't yield the best possible result. But it is good enough to establish that e(d) 2 2O(d), which was used in the proof of Thm. 3.2. Section 4 4.1 Consider the reduction in the proof of Lem. 4.14. Give an estimation for the length d of the word equation and thereby for an upper bound of e(B). Define another reduction where the denotational length of the resulting word equation becomes smaller. This improves also the estimation for e(B). Give a third estimation for e(B) based on the techniques presented in Sect. 3. Hint to the second part: If a system contains two equations x = x0 and xy = x0y0, then the second one can be replaced by y = y0. 4.2 The lower bound for e(c(S); d) given in Ex. 3.4 can be refined. Ko?scielski and Pacholski (1996: Thm. 4.8) consider the following equation with k = 5: xnaxnbxn?1b ? ? ? x2bx1 = axnxk n?1bxk n?2b ? ? ? xk 1ba: Show that there is a unique solution. Derive from this solution a lower bound for the constant hidden in the notation e(c(S); d) 2 c(S) ? 2?(d). Why is k = 5 a good value? Hint: Show first that ö(xi) 2 a? for all 1 <= i <= n. References Abdulrab, H. and P?ecuchet, J. (1990). Solving word equations, J. Symbolic Computation, 8 (5), 499{521. Abdulrab, H. and P?ecuchet, J. (Eds.). (1993). Proceedings of Word Equations and Related Topics (IWWERT '91), Vol. 677 of Lect. Notes Comp. Sci., Berlin-Heidelberg-New York. Springer-Verlag. Cartier, P. and Foata, D. (1969). Probl?emes combinatoires de commutation et r?earrangements. No. 85 in Lecture Notes in Mathematics. Springer, Berlin-Heidelberg-New York. Charatonik, W. and Pacholski, L. (1993). Word Equations with Two Variables, In Abdulrab, H. and P?ecuchet, J.-P. (Eds.), Proceedings of Word Equations and Related Topics, Second International Workshop, IWWERT'91, Rouen, France, Vol. 677 of Lect. Notes Comp. Sci., pp. 43{56 Berlin-Heidelberg-New York. Springer-Verlag. Choffrut, C. and Karhumäki, J. (1997). Combinatorics of Words, In Rozenberg, G. and Salomaa, A. (Eds.), Handbook of Formal Languages, Vol. 1, pp. 329{438. Springer, Berlin-Heidelberg-New York. Dickson, L. E. (1913). Finiteness of the odd perfect and primitive abundant numbers with n distinct prime factors, American Journal of Math., 35, 413{422. 40 Diekert, V., Matiyasevich, Yu., and Muscholl, A. (1997). Solving trace equations using lexicographical normal forms, In Degano, P., Gorrieri, R., and Marchetti-Spaccamela, A. (Eds.), Proc. of the 24th ICALP, Bologna, 1997, No. 1256 in Lect. Notes Comp. Sci., pp. 336{347 Berlin-Heidelberg-New York. Springer-Verlag. Diekert, V. and Rozenberg, G. (Eds.). (1995). The Book of Traces. World Scientific, Singapore. Eyono Obono, S., Goralcik, P., and Maksimenko, M. (1994). Efficient Solving of the Word Equations in One Variable, In Privara, I. et al. (Eds.), 19th Symposium on Mathematical Foundations of Computer Science (MFCS'94), Ko<=sice (Slovakia) 1994, No. 841 in Lect. Notes Comp. Sci., pp. 336{341 Berlin-Heidelberg-New York. Springer. Gathen, J. von zur and Sieveking, M. (1978). A bound on solutions of linear integer equalities and inequalities, Proceedings of the American Mathematical Society, 72 (1), 155{158. Hmelevski>=?, Ju. I. (1971). Equations in Free Semigroups, In Petrovski>=?, I. G. (Ed.), Trudy Mat. Inst. Steklov. 107. (In Russian) English translation in: Proceedings of the Steklov Institute of Mathematics 107 (1976). American Mathematical Society. Hopcroft, J. E. and Ullman, J. D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, MA. Jaffar, J. (1990). Minimal and Complete Word Unification, J. Assoc. Comput. Mach., 37 (1), 47{85. Ko?scielski, A. and Pacholski, L. (1996). Complexity of Makanin's Algorithm, J. Assoc. Comput. Mach., 43 (4), 670{684. Preliminary version in Proc. of the 31st Annual IEEE Symposium on Foundations of Computer Science, Los Alamitos (1990). Ko?scielski, A. and Pacholski, L. (1998). Makanin's algorithm is not primitive recursive, Theoret. Comput. Sci., 191 (1-2), 145{156. Lentin, A. (1972). ? Equations dans les monö?des libres. Gauthiers-Villars. Lothaire, M. (1983). Combinatorics on Words, Vol. 17 of Encyclopedia of Mathematics and its Applications. Addison-Wesley, Reading, MA. Makanin, G. S. (1977). The problem of solvability of equations in a free semigroup, Mat. Sbornik, 103 (2), 147{236. (In Russian) English translation in: Math. USSR Sbornik 32 (1977) 129{198. Makanin, G. S. (1979). Recognition of the rank of equations in a free semigroup, Izv. Akad. Nauk SSR, Ser. Mat. 43. (In Russian) English translation in: Math. USSR Izvestija 14 (1980) 499{545. Makanin, G. S. (1981). Equations in a free semigroup, American Mathematical Society translations (2), 117, 1{6. Makanin, G. S. (1982). Equations in a free group, Izv. Akad. Nauk SSR, Ser. Mat. 46, 1199{ 1273. (In Russian) English translation in: Math. USSR Izvestija 21 (1983) 483{546. 41 Makanin, G. S. (1984). Decidability of the universal and positive theories of a free group, Izv. Akad. Nauk SSR, Ser. Mat. 48, 735{749. (In Russian) English translation in: Math. USSR Izvestija 25 (1985) 75{88. Markowsky, G. (1977). Bounds on the index and period of a binary relation on a finite set, Semigroup Forum, 13, 253{259. Matiyasevich, Yu. (1968). A connection between systems of word and length equations and Hilbert's Tenth Problem, Sem. Mat. V. A. Steklov Math. Inst. Leningrad, 8, 132{ 144. (In Russian) English translation in: Seminars in Mathematics, V. A. Steklov Mathematical Institute 8 (1970) 61{67. Matiyasevich, Yu. (1993). Hilbert's Tenth Problem. MIT Press, Cambridge, Massachusetts. Matiyasevich, Yu. (1997). Some Decision Problems for Traces, In Adian, S. and Nerode, A. (Eds.), Proceedings of the 4th International Symposium on Logical Foundations of Computer Science (LFCS'97), Yaroslavl, Russia, July 6{12, 1997, No. 1234 in Lect. Notes Comp. Sci., pp. 248{257 Berlin-Heidelberg-New York. Springer-Verlag. Mazurkiewicz, A. (1977). Concurrent Program Schemes and their Interpretations, DAIMI Rep. PB 78, Aarhus University, Aarhus. P?ecuchet, J.-P. (1981). Sur la d?etermination du rang d'une ?equation dans le monö?de libre, Theoret. Comput. Sci., 16, 337{340. Perrin, D. (1989). Equations in words, In Ait-Kaci, H. and Nivat, M. (Eds.), Resolution of equations in algebraic structures, Vol. 2, pp. 275{298. Academic Press. Plotkin, G. (1972). Building in equational theories, Machine Intelligence, 7, 115{162. Razborov, A. A. (1984). On systems of equations in a free group, Izv. Akad. Nauk SSR, Ser. Mat. 48, 779{832. (In Russian) English translation in: Math. USSR Izvestija 25 (1985) 115{162. Schulz, K. U. (1992a). Makanin's Algorithm for Word Equations: Two Improvements and a Generalization, In Schulz, K.-U. (Ed.), Proceedings of Word Equations and Related Topics, 1st International Workshop, IWWERT'90, Tübingen, Germany, Vol. 572 of Lect. Notes Comp. Sci., pp. 85{150 Berlin-Heidelberg-New York. Springer-Verlag. Schulz, K. U. (Ed.). (1992b). Proceedings of Word Equations and Related Topics (IWWERT '90), Vol. 572 of Lect. Notes Comp. Sci., Berlin-Heidelberg-New York. Springer-Verlag. Schulz, K. U. (1993). Word Unification and Transformation of Generalized Equations, Journal of Automated Reasoning, 11 (2), 149{184. 42 Index admissible extension, 16 basis, 22 brick, 20 clean convex chain, 21 conjugates, 3 convex chain, 21 convex chain condition, 22 critical boundary, 27 denotational length, 8 domino tower, 4 downward correctness, 26 exponent of periodicity, 5, 17 final index, 22 foundation, 22 free partially commutative monoid, 38 linear order over S, 15 model, 15 non-singular solution, 1 position, 15 primitive word, 3 refinement, 14 right boundary, 20 singular solution, 1 solution, 1, 17 stable normal form, 5 standard representation, 15 system of boundary equations, 17 system of word equations, 1 terminal node, 33 trace monoid, 38 transport position, 27 upward correctness, 26 43