Towards a Unifying Theory of Context-Free Parsing Thomas Schöbel-Theuer Universität Stuttgart, Institut für Informatik Breitwiesenstr. 20-22, 70565 Stuttgart schoebel@informatik.uni-stuttgart.de May 13, 1996 This is a reprint of an explorative paper from the first ASMICS workshop on parsing theory, Milano, Italy, October 1994, published as technical report 126-94 by the University of Milano. This version has some minor errors corrected. The reference [STin] did not appear in the meantime and is therefore now marked as such; however [STrt] is to appear and is a shortened version of the cited [STin]. Abstract We present an approach to context-free parsing which builds up a meta-theory from scratch on this subject. From this meta-theory one can derive any parsing algorithm known to the author (among many others, e.g. Earley's, Tomita's, LL, LR(k) etc.) in a constructive way as well as new algorithms having promising properties. Conclusions from this theory are not only, that all algorithms stem from a single source regardless of the way they have been originally invented, but also open a way to derive specialized algorithms suited for particular demands in practice. This paper is an informal introduction to a theory treated in much more detail in [STin]. Formalizations, proofs and detailed discussions can be found there. Because of limited space we can present only some underlying ideas of a theory which unifies all context-free parsing techniques known to the author. We hope to convince the reader that practically all known parsing algorithms stem from a single source. This claim does apply both to general parsing methods like Earley's or Lang's (resp. Tomita's) as well as known deterministic methods like LL, LC or LR(k). We assume the reader to be familiar with the field of context-free parsing; our notation will be similar to [HU79]. Terminal symbols are denoted by small letters, nonterminals by capital letters, and strings by Greek letters or x; y; z. For detailed definitions and formalizations, see [STin]. 1 "Simple" representations of sentential forms It is intuitively clear that nearly any existing parsing method can be characterized by a data structure representing sentential forms of the grammar. Canonical parsing methods represent left-sentential forms; more general methods may represent arbitrary sentential forms. A core of forms is derived from the start symbol of the grammar, but may be enhanced by additional ones. In contrast to parsing schemata, we focus on the meaning of a data structure, not on the way of how something has to be computed or in which order or how the computations are interrelated. 89 1.1 Recognition graph We start with some arbitrary objects _v forming a nonempty countable set _V. The objects are treated as black boxes; all we know about them is their existence. To each object, a language (i.e. a set of words) over some alphabet V is assigned. To set up this assignment, we use a function ? : _V ! 2V ? called labelling function. Next, we may have one or more binary relations on the set _V. Assume that _E ? _V ? _V is such a relation, then we call _V a set of nodes (vertices) and _E a set of edges, choose some node _s 2 _V called root, and put it all together to a structure _G := ( _V; _E; _s; ?) shortly called graph. In conventional terminology, we have a labelled directed graph with a dedicated root node. Note that we write all parts belonging to a graph with a dot on it, to distinguish for example the nodes _V from an alphabet (vocabulary) V . Of main interest for us is the language associated to such a graph. As the language L( _G), we define L( _G) := [ _v2 _V ?( _v) This general definition is useful whenever the labelling function ? is somehow restricted, e.g. when the number of words j?( _v)j <= k is bounded by some constant k. Otherwise we would be able to denote the whole language by one single node. We shall consider two graphs as equivalent if they represent the same language. Since we are speaking about context-free recognition, we have to relate our labelled graphs to this problem area. We do this by defining: _G is a recognition graph (for some grammar G) iff L( _G) = FL(G) where FL(G) is the set of left-sentential forms of some context-free grammar G = (?; V; S; P ) with terminal alphabet ?, vocabulary V , start symbol S, and productions P as usual. This definition of a recognition graph which represents all derivable left-sentential forms of the grammar is the backbone of our theory because such a graph is used (usually implicitly) in nearly every parsing algorithm. However, in many cases it is not obvious that an algorithm like LR(k) uses such a graph. Before proceeding, we look at some special cases of graphs: A tree is a fully connected graph where each node except the root has exactly one predecessor, and the root has no predecessor. A forest is a graph where each node has at most one predecessor. A labelled graph is in normal form, if for any node all sons are labelled differently. 1.2 Total labelling First some basic terminology known from graph theory: The set of paths ?( _v; _w) is the set of all finite sequences of nodes leading from _v to _w. If the first parameter of ? is the root node _s, we shortly write ?( _v) := ?(_s; _v). Note that in a cyclic graph, this set may contain infinitely many paths each having finite length. By the total labelling !? ( _v), we denote the concatenation of the label languages of all paths to the node _v, more formally !? ( _v) := [ (_s;:::; _v)2?(_v) ?(_s) : : : ?( _v) 90 This also works in the other direction by taking the reverse path: ? ( _v) := [ (_s;:::; _v)2?(_v) ?( _v) : : : ?(_s) In the general form, the languages of the labels are concatenated. Often the label languages consist of one single word; in this case we write one-element sets as the element itself (as far as no confusion can arise). As one can see easily, finite graphs having single-element labels correspond to finite automata where the input symbols are assigned to the nodes^=states (like Moore automata). Mealy automata can be obtained analogously by using edge labels and by building the total labelling over the edges of a path. Important differences to finite automata are: - the graph may be infinite - by the total labelling, a language is assigned to each node, not only to dedicated final states. 1.3 Some further definitions String items IV of some vocabulary V are denoted ff ?fi where ff; fi 2 V ? and ? is a new meta symbol not appearing in V . Production items IP of some grammar G = (?; V; P; S) are widely known in LR(k) and Earley parsing and are written A ! ff ? fi where A ! fffi 2 P . In this paper we will mostly use production items and therefore call them shortly items. The prefix ß of an item is the part before the dot, written ß(A ! ff ? fi) := ff The infix ? of an item is the symbol following the dot, written ?(A ! ff ? Xfi) := X The suffix ö of an item is the remainder after the infix, written ö(A ! ff ? Xfi) := fi Left-sentential forms can also be divided uniquely into prefix, infix and suffix, where the prefix yields all terminals until the first nonterminal appears in the sentential form, the infix yields this first nonterminal (commonly used to do the next left-derivation), and the suffix yields the rest behind the infix. More formally, let xAff 2 FL(G). Then ß(xAff) := x; x 2 ?? ?(xAff) := A; A 2 (V ? ?) [ f"g ö(xAff) := ff; ff 2 V ? such that A = " =) ff = ". The last condition guarantees the uniqueness of the partition, since when the leftsentential form contains only terminal symbols, both the infix and the suffix have to be empty. 91 1.4 The tree of possible left derivations (TPLD) Now we look at an example graph representing all the left-sentential forms of a grammar. In this example, the graph is a tree. Some similarity with Earley's algorithm is expressed by naming the rules predictor, scanner and completer. Definition: The tree of possible left derivations (TPLD) of some grammar G is the smallest tree with root _s labelled S0 ! ?S a produced by the following rules: Predictor: Let _v be a node with label A ! ff ? Bfi. Then there is also a son _w of _v labelled B ! ?fl for each production B ! fl 2 P . Scanner: Let _v be a node with label A ! ff ? afi. Then there is also a brother _v0 of _v labelled A ! ffa ? fi. Completor: Let _v be a node with label B ! fl? and _u the father of this node labelled A ! ff ? Bfi. Then there is also a brother _u0 of _u labelled A ! ffB ? fi. The result of this definition is a usually infinite tree representing all left-sentential forms of the grammar by using the total labelling generated by the suffixes of the labels. Theorem: in a TPLD, for any node _v the following holds: 9x2?? : x ?ffi?( _v) öffi? ( _v) 2 FL(G) Explanation: for any node _v, there exists a terminal string x such that x followed by the infix ? of the label ?( _v) followed by the total suffixes of the path from _v to the root is a leftsentential form. The proof is by easy induction over the predictor, scanner and completor rules. Moreover, this string x associated to a new node generated by some rule is either the same or is a prolongation of the string associated to the old node it was generated from. Definition: A matching TPLD is a TPLD where all branches not matching some input word a1 : : : an a are cut off. Matching means that the above string x must be a prefix of the input word a1 : : : an a. Since in a matching TPLD the terminal string x is always a prefix of the input word, we can code this prefix also into the tree by using another labelling function O : _V ! N [f?g. Then the whole left-sentential form is represented by a1 : : : aO(_v) ?ffi?( _v) öffi? ( _v) if the input position O is incremented accordingly in the scanner rules whenever the shifted-over terminal matches the input. In case O( _v) =? the node _v by definition does not contribute to the language L( _G); this can be used to build some noise into the graph which is required for the simulation of some sophisticated parsing algorithms. To overcome the problems of an infinite structure, we will demonstrate later how to compress such a tree to a graph with polynomial behaviour. As suggested by the names of the rules, the matching TPLD has a close relation to Earley's algorithm: Apart from the fact that it is an infinite tree, the main difference is that the edge relation _E is directly between items, not indirectly via a pointer i and an Earley-stateset Si. 1.5 Tree generators The matching TPLD ist just one example of a recognition graph. In order to build up other tree structures for recognition purposes, we separate the rules for building up the structure of the tree from the rules which show how the tree is labelled. Definition: A tree generator is a set ? ? Z? ? (? [ f"g) ? ~Z?, where Z is some set of states (also called labels) and ~Z := Z ? ftrue; falseg is an association of boolean values to 92 the states. ? is containing rules of the form ((y1; : : : ; yk); X; (~z1; : : : ; ~zl)) with the following meaning: If there exists a chain _v1; : : : ; _vk of nodes labelled appropriately ?( _vi) = yi, then there also exists another chain _v01; : : : ; _v0l with common father _v0 (a so-called brother chain) with labels ?( _v0i) := zi where ~zi = (zi; bi). The boolean component bi is used to influence the O-labelling in the following way: O( _v0i) := ? O( _vk) + jXj if ~zi = (zi; true) ? if ~zi = (zi; false) In the following, we denote the generation of ?-nodes (i.e. bi = false) by brackets. With this model, it is rather easy to specify rules for building up trees. As can be seen, this model is very similar to (nondeterministic) pushdown automata: Just view the (unique) path from the root to a node _v as contents of the stack. Then a rule ((y1; : : : ; yk); X; (~z1; : : : ; ~zl )) of the tree generator can be interpreted as popping off the left-hand side yk; : : : ; y1, scanning the input symbol X, and then pushing the right-hand side z1; : : : ; zl. However, there are some important differences to pushdown automata, from which are the most important: - tree generators operate in parallel, with no specified order - "pops" do not remove old calculations The rules for generating the TPLD can be easily expressed as a tree generator: Let G = (?; V; S; P ) be a grammar. The TPLD tree generator ? ? I?P ? (? [ f"g) ? ~IP ? is defined by A ! ff ? Bfi 2 IP ^ B ! ?fl 2 IP iff. ((A ! ff ? Bfi); "; ([A ! ff ? Bfi]; B ! ?fl)) 2 ? A ! ff ? afi 2 IP iff. ((A ! ff ? afi); a; (A ! ffa ? fi)) 2 ? A ! ff? 2 IP ^ B ! fl ? Affi 2 IP iff. ((B ! fl ? Affi; A ! ff?); "; (B ! flA ? ffi)) 2 ? Note: the first line corresponds to the predictor, the second to the scanner, and the last to the completor. 1.6 The general compression theorem in graphs Definition: A subgraph _S( _G; _v) of a graph _G for some node _v consists of all nodes and edges reachable from the new root node _v. Theorem: Let _v and _v0 be two different nodes of a labelled recognition graph _G = ( _V; _E; _s; ?; O). Whenever the conditions ?( _v) = ?( _v0), O( _v) = O( _v0) and L( _S( _G; _v)) = L( _S( _G; _v0)) hold for a certain representation of the graph language by total labellings, one can melt the nodes _v and _v0 by uniting their predecessors in the graph, and by choosing one of the sets of successors of _v or _v0 as set of successors for the melted node. The resulting graph is equivalent to the old one, i.e. representing the same language. There is an interesting special case of this theorem: _v0 may be a member of the subgraph _S( _G; _v) (or vice versa), but the infinite sublanguages of both _v and _v0 may be even the same (in the case of the TPLD, this may for example occur in a grammar with left recursion). 93 Now when we melt these two nodes, we get a cycle in the graph. This cycle is correct, since it now represents the infinite nesting of sublanguages occurring in the tree due to the lemma of Bar-Hillel. Note that the compression theorem can also be applied in reverse order to split nodes. Theorem: In a graph _G generated by a tree generator ?, all chains with same labels ? und O produced by the right-hand sides of some rule can be melted. This theorem is a simplified version of the more general one found in [STin], where some technical details have to be handled in order to melt nodes which may have labels O( _v) =?. When this theorem is fully applied to the matching TPLD, one gets nearly the same parsing algorithm as Earley's. The main difference is that back pointers are not indirect by input positions i pointing to a set of states where the completor has to find out which items are to be shifted over, but rather directly addressing the appropriate items backwards via the edges of the graph. In [STin], a general parsing algorithm based on the above theorems is presented which can handle arbitrary tree generators. If it is fed with the tree generator for the TPLD, it runs with asymptotically the same time and space complexities as Earley's algorithm, even with unambiguous grammars or bounded state grammars (see [Ear70]). 1.7 Some other equivalence relations Only for purpose of completeness, we mention some other equivalence relations besides the general compression theorem which can be established on recognition graphs. With these equivalence relations, one can build up variants and improvements of generator-based parsing which cannot be presented here due to lack of space. Removal of "-nodes: If the relevant part of a label contributing to the total labelling (in case of the TPLD, this is the suffix of the ?-labelling) is empty (i.e. "), and if this node has exactly one predecessor and exactly one successor, then this node can be removed from the graph by using a shortcut edge directly connecting the predecessor to the successor. This theorem can also be applied in reverse order, by introducing an "-node into any edge of the graph. However, this works only in 1 : 1 correspondences of nodes. To achieve the same in n : m correspondences where an "-node has n predecessors and m successors, we have to connect all n predecessor nodes to all m successors when removing the "-node. Whenever some n nodes called sources are each connected with some m nodes called destinations, we call them together a clique. For the reversal of this theorem, we have to ensure that the n sources and m destinations form a clique with nm connections, before we can introduce a new "-node in the middle which now reduces the number of edges to n +m. It is also possible to move on parts of string labels from one node to another, but only for one fixed concatenation direction. Here we show the construction for the concatenation direction !. To ensure correctness, we must have a hard clique, where each of the n sources has no other successors than the m destinations and vice versa. Then, if the sources have labels ffifi and the destinations have labels flj , then we can move on the fi from the sources to the destinations. As a result, the sources are now labelled fiflj , while the destinations are labelled ffi. 94 1.8 Shift-reduce parsing As another example of a recognition graph, we present a shift-reduce parsing which is both similar to the TPLD and to the classical stack notation. The difference to the TPLD is the following: Whenever a brother is generated in the TPLD (this can occur in the scanner and in the completor), we now generate a son instead. However, to ensure the correctness of the total labelling, we have to change the O-label of the father to ?, such that this node does not count any more for the thus modified total labelling. As before, we denote the generation of ?-nodes (i.e. bi = false) by brackets. Let G be a grammar. The shift-reduce tree generator ? is defined by A ! ff ? Bfi 2 IP ^ B ! ?fl 2 IP iff. ((A ! ff ? Bfi); "; ([A ! ff ? Bfi]; B ! ?fl)) 2 ? A ! ff ? afi 2 IP iff. ((A ! ff ? afi); a; ([A ! ff ? afi]; A ! ffa ? fi)) 2 ? A ! ff? 2 IP ^ B ! fl ? Affi 2 IP iff. ff = X1 : : : Xk and ((B ! fl ? Affi; A ! ?ff; A ! X1 ? X1 : : : Xk; : : : ; A ! X1 : : : Xk?1 ? Xk; A ! ff?); "; ([B ! fl ? Affi]; B ! flA ? ffi)) 2 ? The difference to the TPLD is that some old calculations are not removed from the stack, but accumulate until a sequence A ! ?X1 : : : Xp; : : : ; A ! X1 : : : Xp? (the so-called handle) is on the stack. Then the completor has to remove this whole sequence. To see the connection to classical nondeterministic shift-reduce parsing, we introduce another function on items, the pre-infix ?0, which is defined by ?0(A ! ffX ? fi) := X and ?0(A ! ?ff) := ", i.e. the pre-infix is the one symbol before the dot. It is now easy to see that we just have to build up the total labelling over this pre-infix ?0 and to consider all nodes even with O =?: we then have the stack contents of a classical shift-reduce stack. 1.9 Leftmost versus rightmost derivations When we consider the connection between the TPLD (which could be characterized as a variant of nondeterministic LL(0) parsing) and shift-reduce parsing, which is commonly called a "bottom-up" method, we easily arrive at the following: Theorem: For all nodes _v 2 _V in the TPLD the following holds, provided that a1 : : : an 2 L(G): a1 : : : aO(_v) ?ffi?( _v) öffi? ( _v) 2 FL(G) ! ßffi? ( _v) aO(_v)+1 : : : an 2 FR(G) In words, if we concatenate the prefixes of the items from the root to _v and append the rest of the input, then we get a right sentential form. This can be seen easily because it is nearly mirror symmetric to the left-sentential form. Note that in classical LR theory, roughly the total prefix ! ßffi? ( _v) is called viable prefix. The consequence from this theorem is relevant to the classification of parsing techniques: Since it is simply a matter of interpretation, whether one and the same TPLD represents left- or right-sentential forms, the classical distinction between LL and LR, where the second letter should denote Left resp. Right-derivation, becomes therefore questionable. In our opinion, the traditional and well-established notions LL and LR should no more be 95 associated with the meaning of left- or right-derivation; not only these meanings found in many text books, but also the traditional "top-down" and "bottom-up" characterizations are not totally appropriate. For a more detailed discussion, see [STin]. 1.10 Some variants of generator-based parsing Only some variants can be touched here shortly; for deeper investigations see [STin]. True bottom-up parsing: Just relax the so-called top-down restriction, and predict always all items A ! ?fl regardless of the context. As a result, the graph will represent some "wrong" sentential forms, but as is argued in [STin], these wrong forms do not contribute to the result of the parsing process, namely representing all parse trees. Methods with a top-down restriction should therefore be considered "natural", since relaxing it produces some trial-and-error calculations which can be avoided. Nondirectional, island-driven parsing etc. can also be done by using forests instead of trees for intermediate representations. However, we get nearly the same graph in the end as would result from the corresponding directional method. FIRST- and FOLLOW-restrictions are essentially restrictions on the relation _E. Introducing lookahead can be described either as remembering input symbols by additional labelling functions and then delaying parsing decisions, or as remembering more context about expected terminalisations of the total suffix and matching this against the input. 2 "More complex" representations In this main section we briefly sketch a general method for precomputing information out of the grammar, such that at run time less operations are necessary to build up a recognition graph. 2.1 Macro graphs 2 1 43 56 7 8 Figure 1: Micro graph 2 1 43 56 7 8 Figure 2: Micro graph with macros 96 The basic idea is relatively simple: We divide the the set of nodes of a graph into chunks, called partitions or macros. For each partition, we create a macro node, and assign to it the partition of the underlying micro graph. Then, we build up another (usually less complex) graph structure at the macro level by adding macro edges which contain a representation of the underlying micro edges of the micro graph. In figures 1 and 2, the labels of the micro graph are left out; instead, the micro nodes are numbered. In figure 2, the dotted boxes indicate which micro nodes and edges are packed into one macro. As can be seen in this example, one cannot conclude from the dotted edge (macro edge) which micro edges connect the two macros. A formalization of this rather simple idea is much more complicated and can be found in [STin]. To show a few of the problems, here are some characterizations. Elements of a macro node are: - a partition of the micro graph - dedicated input/output nodes - all micro nodes are numbered Elements of a macro edge are: - a set of pairs (a; b) of numbers representing connections between the numbered input/output nodes of the connected macro nodes. In the macro graph, a labelling function assigns partitions of the micro graph to the macro nodes, and another labelling function assigns sets of pairs of numbers to the macro edges. In order to get a finite set of such labels for the macro graph, we have to renumber all micro nodes in a partition beginning from 1. Also, the macro edges must be labelled uniquely, that means from the labels of the involved macro nodes the set of pairs must follow in a unique way. General problem: Graph isomorphism. Solution: Coding algorithm. Due to the complexity of the graph isomorphism problem, the precomputing is rather expensive. However, since the partitions are relatively small in practice and also the micro nodes carry rich label information, there should be less room for permutations and the problem should therefore be tractable. A general algorithm for precomputing macros is given in [STin, Chapter 4]. As parameters of this algorithm, one can supply an extremely wide variety of heuristics about what should be packed into the macros. The generality of this algorithm lies in the fact that it produces a new tree generator ?0 from an old one ?, whose macros contain subgraphs of what would be in the parsing graph at run time when parsing with ? (modulo some splits). 2.2 LR(0) simulation by macro graphs We now show in an intuitive manner that there is a special packing heuristic for producing macros, which can simulate classical LR(0) parsing. The basic idea is to use the shift-reduce tree generator of section 1.8 and to pack exactly the same items into a macro that would be packed into an item set by the classical subset algorithm of LR(k). In this way, we get a true simulation of LR(0). 97 (The reader may take an example from [ASU86, page 235] where the lookahead information has to be wiped out. Imagine that the circles show the (static) macro borders. Add (static) micro edges inside the circles and between the circles along the depicted arcs which can be interpreted as (static) macro arcs. The whole has to be instantiated at run time.) Since the structure of the underlying shift-reduce micrograph is reconstructible from the precomputed macros, it follows that (nondeterministic) LR(0) parsing is nearly the same as a variant of Earley parsing. The main difference is that operations at run time are avoided by precomputing them into macros, such that instead of instantiating many micro nodes, we can instantiate a whole macro in one step. In the deterministic special case, when the relevant portion of a recognition is limited to a linear chain, the recognition power in terms of language classes is much bigger than with LL since the step relation on the instantiated macro nodes is more dense, because different paths or even cycles of the micro graph are packed into a macro. As one can also see, the macroization technique is much more powerful than the classical construction with sets of items, since a macro can contain more complicated structures denoting complex representations. 2.3 Conclusion: More powerful macros Precomputing of macros can be done by computing a step relation on macros in an incremental way: apply all effects of the micro tree generator to a macro, thereby enlarging it by the newly generated chains. Then split the macro somewhere into as many pieces as desired (or don't split it at all). By applying all rules in combination with any possible input symbols, one can get a macroization of the tree generator which is itself again a tree generator operating on larger chunks of graphs. By various heuristics, one can decide which pieces are to be packed into which macros. As a general idea, splitting of macros may be delayed, yielding complex macros. For example, when applying a macroization to the TPLD, one can transform a regular sublanguage of the grammar to a tree generator which does no push and pop operations during the parsing of this regular sublanguage, by always keeping the boundedly growing stack (which must be achieved by elimination of "-nodes from right recursion) in one macro, only changing the macro contents accordingly to the input progress. It has shown up that nearly all known optimizations of LR techniques (e.g. elimination of unit productions, "stack controlling" LR) can be simulated by macros. When deriving new algorithms via this macro technique by applying heuristics, we have a general problem: the number of "different" (in the sense of graph isomorphism) macros may become unbounded if care is not taken. The solution is to force a split whenever a macro exceeds a maximal size. It is therefrom clear that there exist macroizations that are better than previously known algorithms at least with respect to the number of operations needed at run time. However, this may be at the expense of the space needed, since the number of macros may grow exponentially. On the other hand, the macroization may also be used to derive algorithms which are somewhere within the bandwith between LL and LR, with respect to macro size, "deterministicness", runtime behaviour, or other aspects. Other macroizations may extend far beyond the degree of determinism found in LR parsing, or may be incomparable to previously known algorithms. 98 Current research topics are to find good heuristics in the dilemma between space and time. Some of the questions which have to be solved are: where and when to split nodes for best ("optimal") results? Does there exist a globaly optimal macroization of arbitrary grammars for arbitrary input words (the author believes "no")? What are the deterministic grammar classes for the deterministic special case of various macroizations? The overall lesson from this work is that any existing context-free parsing technique known to the author can be simulated by the sketched theory (and some minor enhancements not presented here), mostly relying on only two main principles: the compression theorem as a theoretical foundation of dynamic programming, and the macroization as a model for precomputing. 3 Further remarks 3.1 Example: deterministic parsing of some ambiguous grammars This example sheds some light on the general usefulness of the macro construction: Let G be an LR(0) grammar. Construct from it a new grammar G0 having the same productions, plus new ones: For each nonterminal A, add a new nontermimal ?A and productions A ! ?A and ?A ! A. For each production A ! X1X2 : : : Xl, add a new production ?A ! ?X1 ?X2 : : : ?Xl where all nonterminals are in the barred form. As is intuitively clear, this modified grammar is no longer an LR(0) grammar. Since it is cyclic, the number of syntax trees is unbounded. Although the generated languages are the same for both grammars, we get lots of reduce-reduce conflicts in G0: Any time an item A ! X1 : : : Xl? is complete, there exists also an item ?A ! ?X1 : : : ?Xl? in the same LR(0) state. It is therefore clear that deterministic parsing with classical LR(0) is impossible. However, our macro construction can do it deterministically: Just pack not only the LR sets of G, but also their corresponding barred forms into a macro, i.e. we simply keep the doubled items together with the A ! ?A and ?A ! A cycles in a macro. This rather construed example can be applied analogously in practice. It shows that some "local" ambiguity can be handled by the macro construction. As long as ambiguous branches of the recognition tree are always at nearly the same depth, they can be packed into a macro and parsed deterministically. Also, noncanonical "bottom-up" parsing, i.e. making reductions inside a parse stack in a limited depth, can be simultated by macros just by keeping the limited part of the stack in one macro. This is further evidence for the generality of the macro method. 3.2 The problems in Tomita's algorithm The above LR(0) simulation by macros is described in [STin] as a special heuristics which allows the transformation of a shift-reduce tree generator ? into an LR(0) tree generator ?0. When running ?0 with the general algorithm described in [STin], all the problems of Tomita's algorithm [Tom86] (cyclic grammars, hidden left recursion, complexity worse than O(n3)) are solved. The interesting point is that the same algorithm that can simulate Earley parsing is also able to simulate nondeterministic LR (Tomita-like) parsing. One has only to feed a different tree generator into this algorithm. 99 3.3 Representation of parse trees Another consequence from the macro construction is about the underlying micro graph: When simulating for example nondeterministic LR, the underlying micrograph is (modulo some splits) the same as in the above nondeterministic shift-reduce parsing which in turn is very similar to Earley parsing. Since the edges of the BMPL micro graph roughly correspond to Earley items (remember that an edge can be viewed as a direct replacement of the indirect Earley backpointer which goes indirectly through the input position), there is no need to represent parse forests as described by Tomita. This is because it is already known (see e.g. [AU72, Algorithm 4.6, page 328], some general issues also in [Ruz79]) that reconstructions of parse trees from Earley's structure are possible. As can be seen from a variant of Earley's structure presented in [STin], O(n2) space suffices to represent all parse trees in such a way that the reconstruction of one single tree requires only a constantly bounded number of instructions for one node of the tree, whereby the previously known algorithms may take more time. 3.4 Concluding remarks A general theoretical framework for deriving new algorithms is presented in [STin]. For a number of questions, solutions are sketched there, while some others remain open. References [ASU86] A. V. Aho, R. Sethi, and J. D. Ullman. Compilers. Principles, Techniques, and Tools. Addison Wesley, 1986. [AU72] Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation and Compiling, volume 1. Prentice-Hall, 1972. [Ear70] J. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94{102, Feb 1970. [HU79] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 1979. [Ruz79] W. L. Ruzzo. On the complexity of general context-free language parsing and recognition. Automata, Languages and Programming: Lecture Notes in Computer Science, 71:489{497, 1979. [STrt] Thomas Schöbel-Theuer. Ein Ansatz für eine allgemeine Theorie kontextfreier Spracherkennung. submitted dissertation in German, submitted 19. April 1996 to the Fakultät Informatik, Universität Stuttgart. [STin] Thomas Schöbel-Theuer. Eine allgemeine Theorie kontextfreier Spracherkennung. Enlarged version of [STrt], unpublished manuscript in German, appearance is uncertain. [Tom86] Masaru Tomita. Efficient parsing for natural language. Academic Publishers, Boston, 1986. 100