Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent

Synopsis

I have found documentation on eliminating left-recursion (such as Eliminating Left Recursion in Parse::RecDescent) to be unsatisfactory. Left recursion is usually eliminated at the expense of associativity. This tutorial seeks to address this issue.

The document provides two implementations for every topic covered. The first shows how the topic applies when evaluating the text at parse time. The second shows how the topic applies when building a parse tree. It is probably best to ignore the latter (parse tree creation) until the former (parse-time eval) is understood.

Feedback and criticisms are welcome.

1. What is Operator Associativity?
2. Parsers and Associativity
- 2.a. Parse-time Evaluation with Associativity
- 2.b. Building a Parse Tree with Associativity
3. Eliminating Left-Recursion
4. Improving Right-Recursion
5. Working Code
- 5.a. ...to Evaluate the Text at Parse-time
- 5.b. ...to Build and Evaluate a Parse Tree

1. What is Operator Associativity?

The Perl binary operators + and - have the same precedence, but that doesn't mean they can be evaluated in any order. For example, consider 4 - 5 + 6.

If executed from left-to-right,
4 - 5 + 6 = (4 - 5) + 6 = 5

If executed from right-to-left,
4 - 5 + 6 = 4 - (5 + 6) = -7
[download]

Similarly,

If executed from left-to-right,
4 ** 3 ** 2 = (4 ** 3) ** 2 = 4096

If executed from right-to-left,
4 ** 3 ** 2 = 4 ** (3 ** 2) = 262144
[download]

Operators which are evaluated from left-to-right are left-associative.

Operators which are evaluated from right-to-left are right-associative.

In Perl, binary operators + and - are left-associative, and binary operator ** is right-associative. (Refer to Operator Precedence and Associativity in perlop for the associativity of other operators.)

2. Parsers and Associativity

Grammars do not specify associativity. A grammar simply defines whether a given string is valid in the language represented by the grammar, and associativity is not needed for that purpose.

However, we're rarely just interested in validity check. Parsers that return a parse tree representing the text being parsed and those that evaluate the text being parsed are much more useful. Because Parse::RecDescent processes rules from left to right, grammars can be written in a form that lends itself well to doing these tasks.

Left-associative:

sum : sum /[+-]/ NUM
    | NUM
[download]

Right-associative:

pow : NUM '**' pow
    | NUM
[download]

The following subsections will enrich these grammars with code to build a parse tree and to evaluate the expression at parse-time. As you will see, no changes will be needed to the grammar.

2.a. Parse-time Evaluation with Associativity

Left-associative:

sum : sum '+' NUM { $item[1] + $item[3] }
    | sum '-' NUM { $item[1] - $item[3] }
    | NUM         { $item[1]            }
[download]

Right-associative:

pow : NUM '**' sum { $item[1] ** $item[3] }
    | NUM          { $item[1]             }
[download]

2.b. Building a Parse Tree with Associativity

Left-associative:

sum : sum /[+-]/ NUM { [ @item[2,1,3] ] }
    | NUM            { [ $item[1]     ] }
[download]

Right-associative:

pow : NUM '**' pow { [ @item[2,1,3] ] }
    | NUM          { [ $item[1]     ] }
[download]

3. Eliminating Left-Recursion

There is a catch. The theory is solid, but parsers have limitations.

Productions of the form a : a b are called left-recursive. An entire class of parser generators cannot process left-recursive grammars, and Parse::RecDescent belongs to that class. Unfortunately, the left-associative rules presented so far are left-recursive. The remainder of this section will show methods of removing left-recursion from grammars for Parse::RecDescent.

3.a. Method 1: Create a Flat List, and Reconstruct

It's easy to parse 4 - 5 + 6 into the list '4', '-', '5', '+', '6'. The following snippet does so:

sum  : NUM sum_        { [ $item[1], @{$item[2]} ] }
sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] }
     |                 { [] }
[download]

If we are evaluating at parse-time, we have little choice but to process the sum as a list rather than a binary operator. When building a parse tree, we have two options. We could leave it as is, or we could convert the list into a tree.

The following subsections show how to evaluate the list and how to treeify it.

3.a.i. ...to Evaluate the Text at Parse-time

{
   sub eval_sum {
      my $acc = shift(@_);
      while (@_) {
         my $op = shift(@_);
         if    ($op eq '+') { $acc += shift(@_); }
         elsif ($op eq '-') { $acc -= shift(@_); }
      }
      return $acc;
   }
}

sum  : NUM sum_         { eval_sum($item[1], @{$item[2]}) }
sum_ : /[+-]/ NUM sum_  { [ $item[1], $item[2], @{$item[3]} ] }
     |                  { [] }
[download]

3.a.ii. ...to Build a Parse Tree

{
   sub treeify {
      my $t = shift(@_);
      $t = [ shift(@_), $t, shift(@_) ]
         while @_;
      return $t;
   }
}

sum  : NUM sum_         { treeify($item[1], @{$item[2]}) }
sum_ : /[+-]/ NUM sum_  { [ $item[1], $item[2], @{$item[3]} ] }
     |                  { [] }
[download]

3.b. Method 2: Create a Flat List Using `<leftop>`, and Reconstruct

This method is the same as Method 1, but takes advantage of a Parse::RecDescent feature to improve readability. Parse::RecDescent has a pair of directives to help build lists. <leftop> is designed to build left-associative lists, and <rightop> is designed to build right-associative lists.

3.b.i. ...to Evaluate the Text at Parse-time

{
   sub eval_sum {
      my $acc = shift(@_);
      while (@_) {
         my $op = shift(@_);
         if    ($op eq '+') { $acc += shift(@_); }
         elsif ($op eq '-') { $acc -= shift(@_); }
      }
      return $acc;
   }
}

sum : <leftop: NUM /[+-]/ NUM> { eval_sum(@{$item[1]}) }
[download]

3.b.ii. ...to Build a Parse Tree

{
   sub treeify {
      my $t = shift(@_);
      $t = [ shift(@_), $t, shift(@_) ]
         while @_;
      return $t;
   }
}

sum : <leftop: NUM /[+-]/ NUM> { treeify(@{$item[1]}) }
[download]

3.c. Method 3: Using a Subrule Argument

Normally, information passes from subrule to superrule. For example, in the following code, rule2 receives the result of rule3. In turn, rule1 receives the result of rule2.

rule1: token rule2
rule2: token rule3
rule3: token
[download]

The deeper something is, the sooner it will get executed. In a list, that means the last (right-most) element encountered will be executed first. With left-associative lists, the opposite is needed. With left-associative lists, information needs to flow from the superrule to the subrule. Fortunately, Parse::RecDescent provides a means of passing information to subrules: Subrule argument lists.

Think of each rule as a function, and of each reference to that rule as a function call. (In fact, this is how the compiled grammars are implemented.) Just like functions can have arguments, so can subrules.

3.c.i. ...to Evaluate the Text at Parse-time

sum  : NUM sum_[ $item[1] ]
sum_ : '+' NUM sum_[ $arg[0] + $item[2] ]
     | '-' NUM sum_[ $arg[0] - $item[2] ]
     | { $arg[0] }
[download]

3.c.ii. ...to Build a Parse Tree

sum  : NUM sum_[ $item[1] ]
sum_ : '+' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ]
     | '-' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ]
     | { $arg[0] }
[download]

4. Improving Right-Recursion

Earlier, we ended up with the following rules for right-recursive binary operators:

pow : NUM '**' pow
    | NUM
[download]

Unlike left-recursion, Parse::RecDescent has no problem with right-recursion. However, Parse::RecDescent handles rules with productions with identical prefixes very inefficiently.

Just like in algebra, we can factor out the common prefix into another rule.

pow   : NUM pow_
pow_  : '**' pow
      |
[download]

The complicated part is how to evaluate the expression or build the parse tree when one of the operands is matched by one rule, and the other is matched by a different rule. It turns out that doing this is very similar to eliminating left-recursion.

4.a. Method 1: Create a Flat List, and Reconstruct

Just like when eliminating left-recursion, we can build a flat list of the whole chain of powers, and work with that. The difference is that the list will be processed from right to left.

4.a.i. ...to Evaluate the Text at Parse-time

{
   sub eval_pow {
      my $acc = pop(@_);
      while (@_) {
         my $op = pop(@_);
         $acc = pop(@_) ** $acc;
      }
      return $acc;
   }
}

pow  : NUM pow_      { eval_pow($item[1], @{$item[2]}) }
pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] }
     |               { [] }
[download]

4.a.ii. ...to Build a Parse Tree

{
   sub treeify_r {
      my $t = pop;
      $t = [ pop, pop, $t ]
         while @_;
      return $t;
   }
}

pow  : NUM pow_      { treeify_r($item[1], @{$item[2]}) }
pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] }
     |               { [] }
[download]

4.b. Method 2: Create a Flat List Using `<rightop>`, and Reconstruct

Just like Parse::RecDescent has a directive for creating a flat list for a left-associative operator (<leftop>), it has one to create a flat list for a right-associative operator (<rightop>).

4.b.i. ...to Evaluate the Text at Parse-time

{
   sub eval_pow {
      my $acc = pop(@_);
      while (@_) {
         my $op = pop(@_);
         $acc = pop(@_) ** $acc;
      }
      return $acc;
   }
}

pow : <rightop: NUM /(\*\*)/ NUM> { eval_pow(@{$item[1]}) }
[download]

4.b.ii. ...to Build a Parse Tree

{
   sub treeify_r {
      my $t = pop;
      $t = [ pop, pop, $t ]
         while @_;
      return $t;
   }
}

pow : <rightop: NUM /(\*\*)/ NUM> { treeify_r(@{$item[1]}) }
[download]

4.c. Method 3: Using a Subrule Argument

Let's look at the algebra again. We can change

pow : NUM '**' pow { $item[1] ** $item[3] }
    | NUM          { $item[1] }
[download]

into

pow   : NUM pow_
pow_  : '**' pow { <<pow's $item[1]>> ** $item[2] }
      |          { <<pow's $item[1]>> }
[download]

The problem is that we have to pass $item[1] from pow to pow_. We've already seen that we can pass data from one rule to another using subrule arguments. When eliminating left-recursion, we used the subrule argument to form a stack. When improving right-recursion, we simply pass from the main rule to the helper rule.

4.c.i. ...to Evaluate the Text at Parse-time

pow  : NUM pow_[ $item[1] ]
pow_ : '**' pow { $arg[0] ** $item[2] }
     |          { $arg[0]             }
[download]

4.c.ii. ...to Build a Parse Tree

pow  : NUM pow_[ $item[1] ]
pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] }
     |          { $arg[0]                         }
[download]

5. Working Code

The following subsections contain complete, working code to parse expressions formed of the +, - and ** binary operators using the Subrule Argument methods. Parentheses are also supported to produce more meaningful results.

In order to support parentheses and to give the operators their proper precedence, the rules used in the upcoming code are slightly different from those seen earlier. Where NUM used to be in the productions, you will now find term (in sum/sum_) and sum (in pow/pow_).

The code of both subsections produce the same output, an uncommented version of the following:

Demonstrates left-associativity
4-5+6   =  5  got  5
(4-5)+6 =  5  got  5
4-(5+6) = -7  got -7

Demonstrates right-associativity
4**3**2   = 262144  got 262144
(4**3)**2 =   4096  got   4096
4**(3**2) = 262144  got 262144
[download]

5.a. ...to Evaluate the Text at Parse-time

use strict;
use warnings;

use Parse::RecDescent ();

my $grammar = <<'__END_OF_GRAMMAR__';

   {
      use strict;
      use warnings;
   }

   parse : expr /^\Z/ { $item[1] }

   # Just an alias
   expr  : pow

   # vvv lowest precedence

   # pow : sum '**' pow
   #     | sum

   pow   : sum pow_[ $item[1] ]
   pow_  : '**' pow { $arg[0] ** $item[2] }
         |          { $arg[0]             }

   # sum : sum /[+-]/ term
   #     | term

   sum   : term sum_[ $item[1] ]
   sum_  : '+' term sum_[ $arg[0] + $item[2] ]
         | '-' term sum_[ $arg[0] - $item[2] ]
         | { $arg[0] }

   # ^^^ highest precedence

   term  : '(' expr ')' { $item[2] }
         | /\d+/

__END_OF_GRAMMAR__

my $parser = Parse::RecDescent->new($grammar)
   or die("Bad grammar\n");

foreach my $expr (
   '4-5+6',      # Demonstrates left-associativity
   '(4-5)+6',
   '4-(5+6)',

   '4**3**2',    # Demonstrates right-associativity
   '(4**3)**2',
   '4**(3**2)',
) {
   my $expected = eval $expr;
   my $got = $parser->parse($expr);
   print("$expr = $expected  got $got\n");
}
[download]

5.b. ...to Build and Evaluate a Parse Tree

use strict;
use warnings;

use Parse::RecDescent ();

my $grammar = <<'__END_OF_GRAMMAR__';

   {
      use strict;
      use warnings;
   }

   parse : expr /^\Z/ { $item[1] }

   # Just an alias
   expr  : pow

   # vvv lowest precedence

   # pow : sum '**' pow
   #     | sum

   pow   : sum pow_[ $item[1] ]
   pow_  : '**' pow { [ $item[1], $arg[0], $item[2] ] }
         |          { $arg[0] }

   # sum : sum /[+-]/ term
   #     | term

   sum   : term sum_[ $item[1] ]
   sum_  : /[+-]/ term sum_[ [ $item[1], $arg[0], $item[2] ] ]
         | { $arg[0] }

   # ^^^ highest precedence

   term  : '(' expr ')' { $item[2] }
         | /\d+/ { [ @item ] }

__END_OF_GRAMMAR__

my $parser = Parse::RecDescent->new($grammar)
   or die("Bad grammar\n");

my %eval = (
   term => sub { $_[1] },
   '+'  => sub { eval_node($_[1]) +  eval_node($_[2]) },
   '-'  => sub { eval_node($_[1]) -  eval_node($_[2]) },
   '**' => sub { eval_node($_[1]) ** eval_node($_[2]) },
);

sub eval_node {
   my ($node) = @_;
   $eval{$node->[0]}->(@$node);
}

foreach my $expr (
   '4-5+6',      # Demonstrates left-associativity
   '(4-5)+6',
   '4-(5+6)',

   '4**3**2',    # Demonstrates right-associativity
   '(4**3)**2',
   '4**(3**2)',
) {
   my $expected = eval $expr;
   my $tree = $parser->parse($expr);
   my $got = eval_node($tree);
   print("$expr = $expected  got $got\n");
}
[download]

Update Aug 13, 2006: The examples have been simplified. A right-associative operator is used for the right-associative examples. Parse-time eval was placed before parse tree building. Added section on simplifying right-recursion. Small additions were made here and there to improve clarity. It still needs to link to a tutorial on precedence.

Update Jun 13, 2014: Fixed spelling and grammar mistakes identified by hexcoder.

Update Oct 3, 2016: Fixed indexing problem raised by an anonymous monk.

In reply to Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent by ikegami

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.

comment on

Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent

Synopsis

Table of Contents

1. What is Operator Associativity?

2. Parsers and Associativity

2.a. Parse-time Evaluation with Associativity

2.b. Building a Parse Tree with Associativity

3. Eliminating Left-Recursion

3.a. Method 1: Create a Flat List, and Reconstruct

3.a.i. ...to Evaluate the Text at Parse-time

3.a.ii. ...to Build a Parse Tree

3.b. Method 2: Create a Flat List Using `<leftop>`, and Reconstruct

3.b.i. ...to Evaluate the Text at Parse-time

3.b.ii. ...to Build a Parse Tree

3.c. Method 3: Using a Subrule Argument

3.c.i. ...to Evaluate the Text at Parse-time

3.c.ii. ...to Build a Parse Tree

4. Improving Right-Recursion

4.a. Method 1: Create a Flat List, and Reconstruct

4.a.i. ...to Evaluate the Text at Parse-time

4.a.ii. ...to Build a Parse Tree

4.b. Method 2: Create a Flat List Using `<rightop>`, and Reconstruct

4.b.i. ...to Evaluate the Text at Parse-time

4.b.ii. ...to Build a Parse Tree

4.c. Method 3: Using a Subrule Argument

4.c.i. ...to Evaluate the Text at Parse-time

4.c.ii. ...to Build a Parse Tree

5. Working Code

5.a. ...to Evaluate the Text at Parse-time

5.b. ...to Build and Evaluate a Parse Tree


No such thing as a small change
	PerlMonks