Eliminating Left Recursion in Parse::RecDescent

Recently I have been participating in the Parse::RecDescent mailing list. One of the subjects that came up was that of

What type of parser is Parse::RecDescent, LL or LR and whats the difference?

In answering this question (twice, once poorly, once not so poorly ;-) I made the observation that since P::RD is a topdown parser it can only handle LL grammars, and that these grammars have the property that they cannot contain left recursion. (All of these points are made in the documentation.)

I went on to explain what this meant and to provide a simple method by which this issue can be resolved. merlyn responded back by saying that he had had an interesting day or two trying to resolve a left recursion problem that he encountered while writing one of his columns. Well it occured to me that if left recursion left (no pun intended) merlyn banging his head on a table for a while then maybe there were a few more of our community that would appreciate an overview of this matter. So here goes...

(Caveat, this is extracted from Red Dragon.. Use the force read the source...)
UPDATE See abstracts node below for a link to the Red Dragon aka the Dragon Book. An excellent resource.

What is Left Recursion?

_{<super>(No its not when the commies get re-elected.)</super>}
Left recusion is when you have a rule that calls itself as its very first subrule. Ie:

expr: expr '+' term | term
[download]

When Parse::RecDescent encounters something like this it should error (its documented that it will, but I haven't checked) as it will go into an endless loop on the 'expr' rule.

UPDATE

Changed the original paragraph that was here as it originally asserted that TheDamian had not implemented multilevel left recursion checks. He has. What he has not done is provided automatic checking for more subtle cases of left-recursion. He lists this as a bug/irritation in the documentation. See that for a better explanation.

Eliminating Left Recursion

Now there is a very simple technique for eliminating left recursion like this. Lets consider an abstract rule

A : A x | y
[download]

Now A is an arbitrary rule. x and y are other rules that DO NOT begin with A. This maps quite nicely onto our earlier example, A being "expr" x being "'+' term" and y being "term".

We can convert this rule into a non left recursive equivelent by changing it to

A : y R
R : x R | e
[download]

R is a new rule that we will synthesize and e is the empty rule. (It always matches.)

So lets apply this to the earlier example: (Updated as per Anonny and Abigail-II's correction)

expr      :     term expr_tail
expr_tail : '+' term expr_tail | {1}
[download]

I use the {1} here to emulate the behaviour of an e match. Perhaps theres a better way, if so let me know.

Anyway, hopefully somebody out there saves themselves some table/head banging because they read this.

Any comments and or corrections are welcome indeed.

Oh, it turns out that there is an algorithmic way of eliminating _all_ left recursion, at whatever depth from an unambiguous grammer. If anyone is interested ill post it in pseudo code, but I do think it would be interesting to write a helper module for P::RD that would be able to this automatically. So if anyone is looking for an interesting project....

Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look.

Back to Meditations