for over a year. It was intended to be a companion to
. Since I doubt I will finish it before another year has past, I am publishing it now. I have made reference to it a number of times over the last year and someone may benefit from it despite the incompleteness. As a result, there may be some grammatical inconsistencies (mostly tense).
I took on a new project of evaluating mathematical expressions where I broke at least 2 rules.
 1. Don't reinvent the wheel
 2. If you must, use the right tool for the job
My goal is to show what I learned by not following the rules. I fully expect to find that the rules are there for a reason. That is ok since a lesson learned by being bitten is much more likely to stick then one from admonishment.
Originally, I planned just 3 variations on the project. I was going to roll my own, create a new one using Parse::RecDescent, and just test the existing wheels. I am deviating from this plan because I have learned there is a lot more options than I had anticipated and that some are better than others. This list now includes:
 Hand rolled
 Parse::RecDescent
 Manual stratification and leftrecursion elimination
 Using the leftop/rightop shortcuts
 Using rightrecursion and reversing the parsetree
 Existing wheels
 Explore alternatives to Parse::RecDescent
 Explore the Red Dragon Book for new ideas.
While I expect the lessons I learn to mostly be negative ones that reinforce the case for following the rules, I won't know for sure until I have tried. I initially thought I could guess what I would encounter based off my experience and knowledge as well as what others had warned me about. I was dead wrong.
My first attempt was handrolling. Evaluating a mathematical expression only requires a handful of steps:
 Breaking the expression up into tokens or atoms (known as lexing)
 Process tokens according to hierachial rules (known as precedence)
 When tokens have the same precedence, process them according to sequencing rules (known as associativity)
This is just a fancy way of saying that when you evaluate a mathematical expression you do parens before exponents which you do before multiplication and division which comes before addition and subtraction and you do all this from left to right.
While the following parser is not finished or polished, it accomplishes the task I set out to do. It takes a multpass approach to overcome the difficulty of precedence and associativity.
package Math::Expression::Evaluator;
@ISA = qw(Exporter);
@EXPORT = qw(evaluate);
$VERSION = '0.01';
use strict;
use warnings;
use Carp;
my %op_eval = (
'^' => sub {$_[0] ** $_[1]},
'+' => sub {$_[0] + $_[1]},
'' => sub {$_[0]  $_[1]},
'*' => sub {$_[0] * $_[1]},
'/' => sub {$_[0] / $_[1]},
);
my %func_eval = (
abs => sub { abs $_[0] },
int => sub { int $_[0] },
sqrt => sub { sqrt $_[0] },
);
my ($func_re) = map qr{$_}, join '', keys %func_eval;
my $oper_re = qr{[()/*^+]};
my $numb_re = qr{[+]?(?:\d+(?:\.\d*)?\.\d+)};
my $parser = qr{($func_re$numb_re$oper_re)};
sub evaluate {
my @stack = @_ == 1 ? parse(@_) : @_;
return $_[0] if @stack == 1;
0 while fix_op(\@stack);
0 while reduce_func(\@stack);
0 while reduce_paren(\@stack);
for my $op (qw[^ * / + ]) {
0 while reduce_op($op, \@stack);
}
croak "Unable to reduce to a number: '@stack'" if @stack != 1;
return evaluate($stack[0]);
}
sub parse {
my $expr = shift @_;
my @part = $expr =~ /$parser/g;
parse_error_check($expr, \@part);
return @part;
}
sub parse_error_check {
my ($expr, $part) = @_;
$expr =~ s/$parser//g;
croak "Unparseable parts: '$expr'" if $expr !~ /^\s*$/;
croak "Not a number: '$part>[0]'" if @$part == 1 && ! is_num($par
+t>[0]);
}
sub is_num { return $_[0] =~ /$numb_re/; }
sub fix_op {
my $stack = shift @_;
for (1 .. $#$stack) {
my $atom = $stack>[$_];
next if ! is_num($atom);
if ($atom =~ s/^([+])//) {
my $op = $1;
next if $stack>[$_  1] =~ m{[(*/+^]};
splice(@$stack, $_, 1, $op, $atom);
return 1;
}
}
return;
}
sub reduce_func {
my $stack = shift @_;
for (0 .. $#$stack) {
my $atom = $stack>[$_];
next if ! is_func($atom);
croak "Function $atom require parens" if $stack>[$_ + 1] ne '
+(';
reduce_paren($stack, $_ + 1);
splice(@$stack, $_, 2, calculate($atom, $stack>[$_ + 1]));
return 1;
}
}
sub is_func { return exists $func_eval{$_[0]}; }
sub calculate {
my ($key, $x, $y, $val) = @_;
eval { $val = is_func($key)
? $func_eval{$key}>($x)
: $op_eval{$key}>($x, $y)
};
croak "Error: $@" if $@;
return $val;
}
sub reduce_paren {
my ($stack, $start) = @_;
$start = 0;
my ($beg, $open);
for ($start .. $#$stack) {
my $atom = $stack>[$_];
next if $atom ne '(' && $atom ne ')';
$open += $atom eq ')' ? 1 : 1;
$beg = $_ if ! defined $beg && $atom eq '(';
next if $open;
my $len = $_  $beg + 1;
splice(@$stack, $beg, $len, evaluate(@{$stack}[$beg + 1 .. $_
+ 1]));
return 1;
}
croak "Unbalanced Parens" if $open;
}
sub reduce_op {
my ($op, $stack) = @_;
return if @$stack < 3;
for (0 .. $#$stack  2) {
my ($prev, $curr, $next) = @{$stack}[$_ .. $_ + 2];
next if $curr ne $op;
croak "Error: '$prev $op $next'" if ! is_num($prev)  ! is_nu
+m($next);
splice(@$stack, $_, 3, calculate($op, $prev, $next));
return 1;
}
return;
}
'This statement is false';
Next I decided to tackle Parse::RecDescent. Admittedly, I had no experience with it before starting this project. In fact, I purposely waited until after I had rolled my own to try it to ensure it would not influence my solution in any way.
Attempt 1:
My first naive attempt failed because I assumed that by placing one production before another in alternation, that the entire string would be processed finding that production before attempting the next production. This is not how Parse::RecDescent works. It does not move further on in the string until it has tried everything it knows about first. This resulted in precedence not being followed.
Attempt 2:
I discovered Re: Leftassociative binary operators in Parse::RecDescent by blokhead which said that one way to solve this problem was to manually introduce stratification. In a nutshell, you start out by looking for the lowest precedence which includes the next highest precedence as part of its production on up to the highest precedence which includes the lowest precedence production. The following is the minimal test case I used trying manual stratification:
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
use Data::Dumper;
my $grammar = q{
evaluate : ADD_SUB
ADD_SUB : MULT_DIV_MOD ADD_SUB_OP ADD_SUB
{ [@item[1,2,3]] }
 MULT_DIV_MOD
ADD_SUB_OP : '+'  ''
MULT_DIV_MOD : GROUP MULT_DIV_MOD_OP MULT_DIV_MOD { [@item[1,2,
+3]] }
 GROUP
MULT_DIV_MOD_OP : '*'  '/'  '%'
GROUP : '(' ADD_SUB ')'
{ $item[2] }
 NUMBER
NUMBER : INTEGER  FLOAT  NAN
INTEGER : /[+]?\d+/
FLOAT : /([+]?)(?=\d\.\d)\d*(\.\d*)?([Ee]([+]?\d+))?/
NAN : /(Inf(inity)?NaN)/i
};
my $parser = new Parse::RecDescent $grammar;
print Dumper $parser>evaluate('42  5 + 1');
# Sees the result as 42  (5 + 1)
As you can see. It does not adhere to the necessary left associativity.
Attempt 3:
I tried to compensate for this by adding leftrecursion. Left recursion is defined when the first part of a production calls itself. Parse::RecDescent does not support leftrecursion but eliminating it (once added) is trivial. I found Eliminating Left Recursion in Parse::RecDescent by demerphq and produced the following:
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
use Data::Dumper;
my $grammar = q{
evaluate : ADD_SUB
#ADD_SUB : MULT_DIV_MOD ADD_SUB_OP ADD_SUB
# { [@item[1,2,3]] }
#  MULT_DIV_MOD
# Reverse to ADD_SUB : ADD_SUB ADD_SUB_OP MULT_DIV_MOD  MULT_DIV_
+MOD
# Use following formula to eliminate left recursion
# A : A x  y > A : y R, R : x R  e
# Let A = ADD_SUB
# Let x = ADD_SUB_OP MULT_DIV_MOD
# let y = MULT_DIV_MOD
ADD_SUB : MULT_DIV_MOD ADD_SUB_TAIL
{ [@item[1,2]] }
ADD_SUB_TAIL : ADD_SUB_OP MULT_DIV_MOD ADD_SUB_TAIL
{ [@item[1..3]] }

ADD_SUB_OP : '+'  ''
# Same as above
MULT_DIV_MOD : GROUP MULT_DIV_MOD_TAIL
{ [@item[1,2]] }
MULT_DIV_MOD_TAIL : MULT_DIV_MOD_OP GROUP MULT_DIV_MOD_TAIL
{ [@item[1..3]] }

MULT_DIV_MOD_OP : '*'  '/'  '%'
GROUP : '(' ADD_SUB ')'
{ $item[2] }
 NUMBER
NUMBER : INTEGER  FLOAT  NAN
INTEGER : /[+]?\d+/
FLOAT : /([+]?)(?=\d\.\d)\d*(\.\d*)?([Ee]([+]?\d+))?/
NAN : /(Inf(inity)?NaN)/i
};
my $parser = new Parse::RecDescent $grammar;
print Dumper $parser>evaluate('42  5 + 1');
Unfortunately, the parse tree is so complex, I would need to write a parse tree parser to be able to evaluate the expression. While I did try very hard to make this work, I ultimately abandoned it.
Attempt 4:
I found Re: Order of Precedence in Parse::RecDescent grammar by ikegami which claimed to handle precedence and associativity. Unfortunately, it had a few bugs that I worked with ikegami to correct. Additionally, it only parsed  it did not evaluate. After several exchanges with ikegami, lots of experimentation, and a bit of wallhead exchanges  I finally got the following to evaluate as it parsed:
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
my %dispatch = (
'+' => sub { $_[0] + $_[1] },
'' => sub { $_[0]  $_[1] },
'*' => sub { $_[0] * $_[1] },
'/' => sub { $_[0] / $_[1] },
'^' => sub { $_[0] ** $_[1] },
'abs' => sub { abs $_[0] },
'sqrt' => sub { sqrt $_[0] },
);
sub calculate {
my $rule = shift @_;
if ($rule eq 'FUNCTION') {
my ($func, $x) = @_;
my $val = eval { $dispatch{$func}>($x); };
die $@ if $@;
return $val;
}
my @atom = @{ shift @_ };
my $val = shift @atom;
while (@atom) {
my ($op, $num) = splice(@atom, 0, 2);
eval { $val = $dispatch{$op}>($val, $num); };
die $@ if $@;
}
return $val;
}
my $grammar = <<'__GRAMMAR__';
evaluate : EXPR /\Z/ { $
+item[1] }
EXPR : ADD_SUB { $
+item[1] }
ADD_SUB : <leftop: MUL_DIV_MOD ADD_SUB_OP MUL_DIV_MOD> { m
+ain::calculate( @item ) }
ADD_SUB_OP : '+'  ''
MUL_DIV_MOD : <leftop: POW MUL_DIV_MOD_OP POW> { m
+ain::calculate( @item ) }
MUL_DIV_MOD_OP : '*'  '/'  '%'
POW : <leftop: FUNCTION POW_OP FUNCTION> { m
+ain::calculate( @item ) }
POW_OP : '^'
FUNCTION : FUNC_NAME GROUP { m
+ain::calculate( @item ) }
 GROUP
FUNC_NAME : 'abs'  'sqrt'
GROUP : '(' EXPR ')' { $
+item[2] }
 NUMBER
NUMBER : FLOAT  INTEGER  NAN
INTEGER : /[+]?\d+/
FLOAT : /([+]?)(?=\d\.\d)\d*(\.\d*)?([Ee]([+]?\d+))?/
NAN : /(Inf(inity)?NaN)/i
__GRAMMAR__
my $parser = Parse::RecDescent>new($grammar) or die("Bad grammar\n");
my $answer = $parser>evaluate('11  (4 + 4)^3 * sqrt(5 * (6  1)) + a
+bs(3)');
print defined $answer ? $answer : 'Invalid expression';
Further Attempts:
While working with ikegami, he pointed out that it was possible to create a sane parse tree and save the evaluation until the end. While having a sane parse tree as a sideeffect was not something I set out to do, I figured it worth investigating. I condensed his example to the following:
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
my %eval = (
disp => \&eval_dispatch,
term => sub { $_[0] > [1] },
'+' => sub { $_[0] + $_[1] },
'' => sub { $_[0]  $_[1] },
'*' => sub { $_[0] * $_[1] },
'/' => sub { $_[0] / $_[1] },
'%' => sub { $_[0] % $_[1] },
);
sub eval_node {
local *_ = \$_[0];
return $eval{disp}>($_>[0], $_);
}
sub eval_dispatch {
my ($op, $node) = @_;
return $eval{$op}>($node) if $op eq 'term';
my $x = eval_node($node>[1]);
my $y = eval_node($node>[2]);
return $eval{$op}>($x, $y);
}
sub treeify {
my $t = shift @_;
$t = [ shift @_, $t, shift @_ ] while @_;
return $t;
}
my $grammar = <<'__END_OF_GRAMMAR__';
build : expr /\Z/ { $item[1] }
# Just an alias
expr : sum
# vvv lowest precedence
sum : <leftop: prod SUM prod> { main::treeify(@{$item[1]}) }
prod : <leftop: term PROD term> { main::treeify(@{$item[1]}) }
# ^^^ highest precedence
term : '(' <commit> expr ')' { $item[3] }
 UNSIGN_INT { [ @item ] }
# Tokens
UNSIGN_INT : /\d+/
SUM : '+'  ''
PROD : '*'  '/'  '%'
__END_OF_GRAMMAR__
my $parser = Parse::RecDescent>new($grammar) or die("Bad grammar\n");
my $tree = $parser>build('11  6 + 4');
my $eval = eval_node($tree);
print "$eval\n";
I believe I could adapt this technique to evaluate the unruly parse tree that I abandoned earlier but do not feel it is worth the effort. We (
ikegami and I) also discussed returning blessed nodes which would greatly simplify the evaluation of the parse tree. Enough abstraction might make using
Parse::RecDescent for this task tolerable. I can't help thinking that a different parsing method would be straight forward and pleasant without all the distractions.
I am not sure I am even going to attempt rightrecursion and reversing the parse tree. This is a technique tye and I discussed in the chatterbox but my gut is telling me that Parse::RecDescent requires too much submission to be considered the right tool for this job. This contradicts what others I perceived to be more knowledgeable than myself told me. This is not to say that I am not completely blown away by the power and versatility of Parse::RecDescent. In fact, ikegami was inspired to write Operator Associativity and Eliminating LeftRecursion in Parse::RecDescent as a result of all my questions. It would have been nice if it existed before I dug in.
Next I tried Parse::Yapp. I had read that Parse::RecDescent and Parse::Yapp suffer from different sides of the same problem (rightrecursive vs leftrecursive). I do not believe this to be the case at all. Since all my math operators are left associative, except exponents, Parse::Yapp should make my task easier. This is an understatement! Parse::Yapp has precedence AND associativity built in. This code could obviously be abstracted further but it demonstrates how simple it is:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Parse::Yapp;
my $grammar = join '', <DATA>;
my $parser = Parse::Yapp>new(input => $grammar);
my $yapptxt = $parser>Output(classname => 'Calc');
eval $yapptxt; # normally written to a file
my $calc = Calc>new();
$calc>Ingest("11  (4 + 4)^3 * sqrt(5 * (6  1)) + abs(3)\n");
my $output = $calc>YYParse(yylex => \&Calc::Lexer);
print $output;
__DATA__
%left '' '+'
%left '*' '/' '%'
%right '^'
%nonassoc 'sqrt' 'abs'
%%
stack :  stack expr '\n' { push @{$_[1]}, $_[2]; $_[1][0] };
expr : add  del  mul  div  mod  pow  grp  sqrt  abs  NUM;
add : expr '+' expr { $_[1] + $_[3] };
del : expr '' expr { $_[1]  $_[3] };
mul : expr '*' expr { $_[1] * $_[3] };
div : expr '/' expr { $_[1] / $_[3] };
mod : expr '%' expr { $_[1] % $_[3] };
pow : expr '^' expr { $_[1] ** $_[3] };
grp : '(' expr ')' { $_[2] };
abs : 'abs' grp { abs($_[2]) };
sqrt : 'sqrt' grp { sqrt($_[2]) };
%%
sub Lexer {
my $parser = shift @_;
local *_ = \$parser>YYData>{INPUT};
s/^[ \t]+//; # leading nonnewline whitespace
if (s/^(([+]?)(?=\d\.\d)\d*(\.\d*)?([Ee]([+]?\d+))?)//) {
return ('NUM', $1); # borrowed from Scalar::Util
}
return ($1, $1) if s/^(sqrtabs)//;
return ($1, $1) if s/^(.)//s;
}
sub Ingest {
my $self = shift @_;
$self>YYData>{INPUT} = $_[0];
}
While I didn't experiment exhaustively, it appears that it doesn't matter what order I put anything in  the precedence and associativity declaration at the top makes it "just work".
Parse::Yapp is not all good. You have to write your own lexer routine. Additionally, the documentation is not nearly as comprehensive as
Parse::RecDescent. After discussing writing your own lexer in the CB with
Corion, I am not as convinced having an autogenerated lexer is necessary. Since this is a context free grammar, the only problems you might run into is ordering things such that tokens that contain other tokens as a substring come first. I was going to see if
perl::byacc might buy me an autogenerated lexer but I am not sure it is necessary now.
I tried to get into HOP::Parser next but the documentation refers you to 40 pages from the book. I own the book and have even skim read most of it to include chapter 8. I contacted Ovid to try and get a simple example showing precedence and associtivity that I could expand. He is interested but also busy. I think I am going to shelf HOP::Parser until I can either find the time to read the book indepth or Ovid finds the time to provide an example.
Parse::Earley is Luke Palmer's implementation of Jay Earley's efficient contextfree parsing algorithm (topdown). Even though it does not provide builtin precedence like Parse::Yapp (bottomup), it handles right and left associativity without resorting to tricks. Additionally, it produces all possible parse trees when more than one are possible. This may or may not be a good thing if you are only after the "best".
While I came short of actually evaluating the expression, I am quite impressed with what Luke was able to put together in 1 week (literally). I did have some hangups with the POD and figuring out how to just tell the parser to consume all input as well as a few other things. I ended up abandoning it as there is no support for actions to either build your own tree or evaluate on the fly. The resulting tree may be easy to walk for evaluation but it seemed over my head.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Parse::Earley;
my $parser = Parse::Earley>new();
my $grammar = <<'__GRAMMAR__';
input: expr
expr: expr '+' mul_term
 expr '' mul_term
 mul_term
mul_term: mul_term '*' exp
 mul_term '/' exp
 exp
exp: term '^' exp
 term
term: '(' expr ')'
 /\d+/ { $_ < 256 }
__GRAMMAR__
my $str = '1 + 2  3';
$parser>grammar($grammar);
$parser>start('input');
$parser>advance($str) for 1..6;
my ($tree) = $parser>matches_all($str, 'input');
print Dumper($tree);
Here is where I will talk about the other parsers, theory of the dragon book, and my lessons learned. If I don't run out of steam by now I want to see if I can write my own config language. The purpose of which would be to describe your operators functions precedence levels etc. The first pass would be to parse this config language and constructs the grammar to parse the language itself. This way, adding operators or precedence levels would only involve updating a config file.
Here is where I will draw my conclusions.
Ok, so I didn't actually finish the project but I did draw some conclusions. In no particular order:
People giving advice don't always listen:
When I indicated I was going to be writing a mathematical expression evaluator, I was told quite emphatically that Parse::RecDescent was the right tool for the job by several monks I respect. No warnings were given that handling precedence and associativity was tricky. Additionally, they seemed to be focused on the parsing part and not on the evaluation part. Creating a parse tree is one thing, using it is quite a different thing. I am quite convinced that Parse::RecDescent was the wrong tool for the job.
People giving advice are not always well informed:
I was told that Parse::RecDescent and Parse::Yapp were pretty much equal despite tackling the same problem in different ways (rightrecursion vs leftrecursion). From my limited experience, this is just plain wrong. If you need to handle precedence and/or associativity, Parse::Yapp is far superior.
Existing wheels don't always fit your car:
I did not try all the existing wheels I listed above but the ones I did try were clumsy and cumbersome. My handrolled version did exactly what I wanted. It took me less than an hour to write while the others took much longer.
You can learn a lot from breaking the rules: