perlmeditation
demerphq
Recently I have been participating in the Parse::RecDescent mailing list. One of the subjects that came up was that of <p>
<em>What type of parser is Parse::RecDescent, LL or LR and whats the difference?</em><p>
In answering this question (twice, once poorly, once not so poorly ;-) I made the observation that since P::RD is a topdown parser it can only handle LL grammars, and that these grammars have the property that they cannot contain left recursion. (All of these points are made in the documentation.) <p>
I went on to explain what this meant and to provide a simple method by which this issue can be resolved. [merlyn] responded back by saying that he had had an interesting day or two trying to resolve a left recursion problem that he encountered while writing one of his [http://www.stonehenge.com/merlyn/LinuxMag/col29.html|columns]. Well it occured to me that if left recursion left (no pun intended) [merlyn] banging his head on a table for a while then maybe there were a few more of our community that would appreciate an overview of this matter. So here goes...<p>
<readmore>
(Caveat, this is extracted from Red Dragon.. Use the force read the source...)<br>
<strong>UPDATE</strong> See [abstracts] node below for a link to the Red Dragon aka the Dragon Book. An excellent resource.
<p>
<b>What is Left Recursion?</b><p>
<tt><sub><super>(No its not when the commies get re-elected.)</super></sub></tt><br>
Left recusion is when you have a rule that calls itself as its very first subrule. Ie:
<code>
expr: expr '+' term | term
</code>
When Parse::RecDescent encounters something like this it should error (its documented that it will, but I haven't checked) as it will go into an endless loop on the 'expr' rule. <p>
<Strong>UPDATE</strong><p>
Changed the original paragraph that was here as it originally asserted that [TheDamian] had not implemented multilevel left recursion checks. He has. What he has not done is provided automatic checking for more subtle cases of left-recursion. He lists this as a bug/irritation in the documentation. See that for a better explanation.<p>
<b>Eliminating Left Recursion</b><p>
Now there is a very simple technique for eliminating left recursion like this. Lets consider an abstract rule
<code>
A : A x | y
</code>
Now <b>A</b> is an arbitrary rule. <b>x</b> and <b>y</b> are other rules that DO NOT begin with <b>A</b>. This maps quite nicely onto our earlier example, <b>A</b> being "expr" <b>x</b> being "'+' term" and <b>y</b> being "term".<p>
We can convert this rule into a non left recursive equivelent by changing it to
<code>
A : y R
R : x R | e
</code>
<b>R</b> is a new rule that we will synthesize and <b>e</b> is the empty rule. (It always matches.)<p>
So lets apply this to the earlier example: <font size=1>(Updated as per Anonny and [Abigail-II]'s correction)</font>
<code>
expr : term expr_tail
expr_tail : '+' term expr_tail | {1}
</code>
I use the {1} here to emulate the behaviour of an e match. Perhaps theres a better way, if so let me know.<p>
Anyway, hopefully somebody out there saves themselves some table/head banging because they read this.<p>
Any comments and or corrections are welcome indeed.<p>
Oh, it turns out that there is an algorithmic way of eliminating _all_ left recursion, at whatever depth from an unambiguous grammer. If anyone is interested ill post it in pseudo code, but I do think it would be interesting to write a helper module for P::RD that would be able to this automatically. So if anyone is looking for an interesting project....
<p>
Yves / DeMerphq<br>
--- <BR>
Writing a good benchmark isnt as easy as it might look.