Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Breaking The Rules II

by Limbic~Region (Chancellor)
on Jul 02, 2007 at 13:48 UTC ( [id://624445] : perlmeditation . print w/replies, xml ) Need Help??

This meditation has been sitting, unfinished, on my scratch pad for over a year. It was intended to be a companion to Breaking The Rules. Since I doubt I will finish it before another year has past, I am publishing it now. I have made reference to it a number of times over the last year and someone may benefit from it despite the incompleteness. As a result, there may be some grammatical inconsistencies (mostly tense).

I took on a new project of evaluating mathematical expressions where I broke at least 2 rules.

  • 1. Don't reinvent the wheel
  • 2. If you must, use the right tool for the job

My goal is to show what I learned by not following the rules. I fully expect to find that the rules are there for a reason. That is ok since a lesson learned by being bitten is much more likely to stick then one from admonishment.

Originally, I planned just 3 variations on the project. I was going to roll my own, create a new one using Parse::RecDescent, and just test the existing wheels. I am deviating from this plan because I have learned there is a lot more options than I had anticipated and that some are better than others. This list now includes:

While I expect the lessons I learn to mostly be negative ones that re-inforce the case for following the rules, I won't know for sure until I have tried. I initially thought I could guess what I would encounter based off my experience and knowledge as well as what others had warned me about. I was dead wrong.

My first attempt was hand-rolling. Evaluating a mathematical expression only requires a handful of steps:

  • Breaking the expression up into tokens or atoms (known as lexing)
  • Process tokens according to hierachial rules (known as precedence)
  • When tokens have the same precedence, process them according to sequencing rules (known as associativity)
This is just a fancy way of saying that when you evaluate a mathematical expression you do parens before exponents which you do before multiplication and division which comes before addition and subtraction and you do all this from left to right.

While the following parser is not finished or polished, it accomplishes the task I set out to do. It takes a mult-pass approach to overcome the difficulty of precedence and associativity.

package Math::Expression::Evaluator; @ISA = qw(Exporter); @EXPORT = qw(evaluate); $VERSION = '0.01'; use strict; use warnings; use Carp; my %op_eval = ( '^' => sub {$_[0] ** $_[1]}, '+' => sub {$_[0] + $_[1]}, '-' => sub {$_[0] - $_[1]}, '*' => sub {$_[0] * $_[1]}, '/' => sub {$_[0] / $_[1]}, ); my %func_eval = ( abs => sub { abs $_[0] }, int => sub { int $_[0] }, sqrt => sub { sqrt $_[0] }, ); my ($func_re) = map qr{$_}, join '|', keys %func_eval; my $oper_re = qr{[()/*^+-]}; my $numb_re = qr{[+-]?(?:\d+(?:\.\d*)?|\.\d+)}; my $parser = qr{($func_re|$numb_re|$oper_re)}; sub evaluate { my @stack = @_ == 1 ? parse(@_) : @_; return $_[0] if @stack == 1; 0 while fix_op(\@stack); 0 while reduce_func(\@stack); 0 while reduce_paren(\@stack); for my $op (qw[^ * / + -]) { 0 while reduce_op($op, \@stack); } croak "Unable to reduce to a number: '@stack'" if @stack != 1; return evaluate($stack[0]); } sub parse { my $expr = shift @_; my @part = $expr =~ /$parser/g; parse_error_check($expr, \@part); return @part; } sub parse_error_check { my ($expr, $part) = @_; $expr =~ s/$parser//g; croak "Unparseable parts: '$expr'" if $expr !~ /^\s*$/; croak "Not a number: '$part->[0]'" if @$part == 1 && ! is_num($par +t->[0]); } sub is_num { return $_[0] =~ /$numb_re/; } sub fix_op { my $stack = shift @_; for (1 .. $#$stack) { my $atom = $stack->[$_]; next if ! is_num($atom); if ($atom =~ s/^([+-])//) { my $op = $1; next if $stack->[$_ - 1] =~ m{[(*/+^-]}; splice(@$stack, $_, 1, $op, $atom); return 1; } } return; } sub reduce_func { my $stack = shift @_; for (0 .. $#$stack) { my $atom = $stack->[$_]; next if ! is_func($atom); croak "Function $atom require parens" if $stack->[$_ + 1] ne ' +('; reduce_paren($stack, $_ + 1); splice(@$stack, $_, 2, calculate($atom, $stack->[$_ + 1])); return 1; } } sub is_func { return exists $func_eval{$_[0]}; } sub calculate { my ($key, $x, $y, $val) = @_; eval { $val = is_func($key) ? $func_eval{$key}->($x) : $op_eval{$key}->($x, $y) }; croak "Error: $@" if $@; return $val; } sub reduce_paren { my ($stack, $start) = @_; $start ||= 0; my ($beg, $open); for ($start .. $#$stack) { my $atom = $stack->[$_]; next if $atom ne '(' && $atom ne ')'; $open += $atom eq ')' ? -1 : 1; $beg = $_ if ! defined $beg && $atom eq '('; next if $open; my $len = $_ - $beg + 1; splice(@$stack, $beg, $len, evaluate(@{$stack}[$beg + 1 .. $_ +- 1])); return 1; } croak "Unbalanced Parens" if $open; } sub reduce_op { my ($op, $stack) = @_; return if @$stack < 3; for (0 .. $#$stack - 2) { my ($prev, $curr, $next) = @{$stack}[$_ .. $_ + 2]; next if $curr ne $op; croak "Error: '$prev $op $next'" if ! is_num($prev) || ! is_nu +m($next); splice(@$stack, $_, 3, calculate($op, $prev, $next)); return 1; } return; } 'This statement is false';

Next I decided to tackle Parse::RecDescent. Admittedly, I had no experience with it before starting this project. In fact, I purposely waited until after I had rolled my own to try it to ensure it would not influence my solution in any way.

Attempt 1:
My first naive attempt failed because I assumed that by placing one production before another in alternation, that the entire string would be processed finding that production before attempting the next production. This is not how Parse::RecDescent works. It does not move further on in the string until it has tried everything it knows about first. This resulted in precedence not being followed.

Attempt 2:
I discovered Re: Left-associative binary operators in Parse::RecDescent by blokhead which said that one way to solve this problem was to manually introduce stratification. In a nutshell, you start out by looking for the lowest precedence which includes the next highest precedence as part of its production on up to the highest precedence which includes the lowest precedence production. The following is the minimal test case I used trying manual stratification:

#!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $grammar = q{ evaluate : ADD_SUB ADD_SUB : MULT_DIV_MOD ADD_SUB_OP ADD_SUB { [@item[1,2,3]] } | MULT_DIV_MOD ADD_SUB_OP : '+' | '-' MULT_DIV_MOD : GROUP MULT_DIV_MOD_OP MULT_DIV_MOD { [@item[1,2, +3]] } | GROUP MULT_DIV_MOD_OP : '*' | '/' | '%' GROUP : '(' ADD_SUB ')' { $item[2] } | NUMBER NUMBER : INTEGER | FLOAT | NAN INTEGER : /[+-]?\d+/ FLOAT : /([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ NAN : /(Inf(inity)?|NaN)/i }; my $parser = new Parse::RecDescent $grammar; print Dumper $parser->evaluate('42 - 5 + 1'); # Sees the result as 42 - (5 + 1)
As you can see. It does not adhere to the necessary left associativity.

Attempt 3:
I tried to compensate for this by adding left-recursion. Left recursion is defined when the first part of a production calls itself. Parse::RecDescent does not support left-recursion but eliminating it (once added) is trivial. I found Eliminating Left Recursion in Parse::RecDescent by demerphq and produced the following:

#!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $grammar = q{ evaluate : ADD_SUB #ADD_SUB : MULT_DIV_MOD ADD_SUB_OP ADD_SUB # { [@item[1,2,3]] } # | MULT_DIV_MOD # Reverse to ADD_SUB : ADD_SUB ADD_SUB_OP MULT_DIV_MOD | MULT_DIV_ +MOD # Use following formula to eliminate left recursion # A : A x | y -> A : y R, R : x R | e # Let A = ADD_SUB # Let x = ADD_SUB_OP MULT_DIV_MOD # let y = MULT_DIV_MOD ADD_SUB : MULT_DIV_MOD ADD_SUB_TAIL { [@item[1,2]] } ADD_SUB_TAIL : ADD_SUB_OP MULT_DIV_MOD ADD_SUB_TAIL { [@item[1..3]] } | ADD_SUB_OP : '+' | '-' # Same as above MULT_DIV_MOD : GROUP MULT_DIV_MOD_TAIL { [@item[1,2]] } MULT_DIV_MOD_TAIL : MULT_DIV_MOD_OP GROUP MULT_DIV_MOD_TAIL { [@item[1..3]] } | MULT_DIV_MOD_OP : '*' | '/' | '%' GROUP : '(' ADD_SUB ')' { $item[2] } | NUMBER NUMBER : INTEGER | FLOAT | NAN INTEGER : /[+-]?\d+/ FLOAT : /([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ NAN : /(Inf(inity)?|NaN)/i }; my $parser = new Parse::RecDescent $grammar; print Dumper $parser->evaluate('42 - 5 + 1');
Unfortunately, the parse tree is so complex, I would need to write a parse tree parser to be able to evaluate the expression. While I did try very hard to make this work, I ultimately abandoned it.

Attempt 4:
I found Re: Order of Precedence in Parse::RecDescent grammar by ikegami which claimed to handle precedence and associativity. Unfortunately, it had a few bugs that I worked with ikegami to correct. Additionally, it only parsed - it did not evaluate. After several exchanges with ikegami, lots of experimentation, and a bit of wall-head exchanges - I finally got the following to evaluate as it parsed:

#!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; my %dispatch = ( '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { $_[0] / $_[1] }, '^' => sub { $_[0] ** $_[1] }, 'abs' => sub { abs $_[0] }, 'sqrt' => sub { sqrt $_[0] }, ); sub calculate { my $rule = shift @_; if ($rule eq 'FUNCTION') { my ($func, $x) = @_; my $val = eval { $dispatch{$func}->($x); }; die $@ if $@; return $val; } my @atom = @{ shift @_ }; my $val = shift @atom; while (@atom) { my ($op, $num) = splice(@atom, 0, 2); eval { $val = $dispatch{$op}->($val, $num); }; die $@ if $@; } return $val; } my $grammar = <<'__GRAMMAR__'; evaluate : EXPR /\Z/ { $ +item[1] } EXPR : ADD_SUB { $ +item[1] } ADD_SUB : <leftop: MUL_DIV_MOD ADD_SUB_OP MUL_DIV_MOD> { m +ain::calculate( @item ) } ADD_SUB_OP : '+' | '-' MUL_DIV_MOD : <leftop: POW MUL_DIV_MOD_OP POW> { m +ain::calculate( @item ) } MUL_DIV_MOD_OP : '*' | '/' | '%' POW : <leftop: FUNCTION POW_OP FUNCTION> { m +ain::calculate( @item ) } POW_OP : '^' FUNCTION : FUNC_NAME GROUP { m +ain::calculate( @item ) } | GROUP FUNC_NAME : 'abs' | 'sqrt' GROUP : '(' EXPR ')' { $ +item[2] } | NUMBER NUMBER : FLOAT | INTEGER | NAN INTEGER : /[+-]?\d+/ FLOAT : /([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?/ NAN : /(Inf(inity)?|NaN)/i __GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); my $answer = $parser->evaluate('11 - (4 + 4)^3 * sqrt(5 * (6 - 1)) + a +bs(-3)'); print defined $answer ? $answer : 'Invalid expression';

Further Attempts:
While working with ikegami, he pointed out that it was possible to create a sane parse tree and save the evaluation until the end. While having a sane parse tree as a side-effect was not something I set out to do, I figured it worth investigating. I condensed his example to the following:

#!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; my %eval = ( disp => \&eval_dispatch, term => sub { $_[0] -> [1] }, '+' => sub { $_[0] + $_[1] }, '-' => sub { $_[0] - $_[1] }, '*' => sub { $_[0] * $_[1] }, '/' => sub { $_[0] / $_[1] }, '%' => sub { $_[0] % $_[1] }, ); sub eval_node { local *_ = \$_[0]; return $eval{disp}->($_->[0], $_); } sub eval_dispatch { my ($op, $node) = @_; return $eval{$op}->($node) if $op eq 'term'; my $x = eval_node($node->[1]); my $y = eval_node($node->[2]); return $eval{$op}->($x, $y); } sub treeify { my $t = shift @_; $t = [ shift @_, $t, shift @_ ] while @_; return $t; } my $grammar = <<'__END_OF_GRAMMAR__'; build : expr /\Z/ { $item[1] } # Just an alias expr : sum # vvv lowest precedence sum : <leftop: prod SUM prod> { main::treeify(@{$item[1]}) } prod : <leftop: term PROD term> { main::treeify(@{$item[1]}) } # ^^^ highest precedence term : '(' <commit> expr ')' { $item[3] } | UNSIGN_INT { [ @item ] } # Tokens UNSIGN_INT : /\d+/ SUM : '+' | '-' PROD : '*' | '/' | '%' __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); my $tree = $parser->build('11 - 6 + 4'); my $eval = eval_node($tree); print "$eval\n";
I believe I could adapt this technique to evaluate the unruly parse tree that I abandoned earlier but do not feel it is worth the effort. We (ikegami and I) also discussed returning blessed nodes which would greatly simplify the evaluation of the parse tree. Enough abstraction might make using Parse::RecDescent for this task tolerable. I can't help thinking that a different parsing method would be straight forward and pleasant without all the distractions.

I am not sure I am even going to attempt right-recursion and reversing the parse tree. This is a technique tye and I discussed in the chatterbox but my gut is telling me that Parse::RecDescent requires too much submission to be considered the right tool for this job. This contradicts what others I perceived to be more knowledgeable than myself told me. This is not to say that I am not completely blown away by the power and versatility of Parse::RecDescent. In fact, ikegami was inspired to write Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent as a result of all my questions. It would have been nice if it existed before I dug in.

Next I tried Parse::Yapp. I had read that Parse::RecDescent and Parse::Yapp suffer from different sides of the same problem (right-recursive vs left-recursive). I do not believe this to be the case at all. Since all my math operators are left associative, except exponents, Parse::Yapp should make my task easier. This is an understatement! Parse::Yapp has precedence AND associativity built in. This code could obviously be abstracted further but it demonstrates how simple it is:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Parse::Yapp; my $grammar = join '', <DATA>; my $parser = Parse::Yapp->new(input => $grammar); my $yapptxt = $parser->Output(classname => 'Calc'); eval $yapptxt; # normally written to a file my $calc = Calc->new(); $calc->Ingest("11 - (4 + 4)^3 * sqrt(5 * (6 - 1)) + abs(-3)\n"); my $output = $calc->YYParse(yylex => \&Calc::Lexer); print $output; __DATA__ %left '-' '+' %left '*' '/' '%' %right '^' %nonassoc 'sqrt' 'abs' %% stack : | stack expr '\n' { push @{$_[1]}, $_[2]; $_[1][0] }; expr : add | del | mul | div | mod | pow | grp | sqrt | abs | NUM; add : expr '+' expr { $_[1] + $_[3] }; del : expr '-' expr { $_[1] - $_[3] }; mul : expr '*' expr { $_[1] * $_[3] }; div : expr '/' expr { $_[1] / $_[3] }; mod : expr '%' expr { $_[1] % $_[3] }; pow : expr '^' expr { $_[1] ** $_[3] }; grp : '(' expr ')' { $_[2] }; abs : 'abs' grp { abs($_[2]) }; sqrt : 'sqrt' grp { sqrt($_[2]) }; %% sub Lexer { my $parser = shift @_; local *_ = \$parser->YYData->{INPUT}; s/^[ \t]+//; # leading non-newline whitespace if (s/^(([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?)//) { return ('NUM', $1); # borrowed from Scalar::Util } return ($1, $1) if s/^(sqrt|abs)//; return ($1, $1) if s/^(.)//s; } sub Ingest { my $self = shift @_; $self->YYData->{INPUT} = $_[0]; }
While I didn't experiment exhaustively, it appears that it doesn't matter what order I put anything in - the precedence and associativity declaration at the top makes it "just work". Parse::Yapp is not all good. You have to write your own lexer routine. Additionally, the documentation is not nearly as comprehensive as Parse::RecDescent. After discussing writing your own lexer in the CB with Corion, I am not as convinced having an auto-generated lexer is necessary. Since this is a context free grammar, the only problems you might run into is ordering things such that tokens that contain other tokens as a substring come first. I was going to see if perl::byacc might buy me an auto-generated lexer but I am not sure it is necessary now.

I tried to get into HOP::Parser next but the documentation refers you to 40 pages from the book. I own the book and have even skim read most of it to include chapter 8. I contacted Ovid to try and get a simple example showing precedence and associtivity that I could expand. He is interested but also busy. I think I am going to shelf HOP::Parser until I can either find the time to read the book in-depth or Ovid finds the time to provide an example.

Parse::Earley is Luke Palmer's implementation of Jay Earley's efficient context-free parsing algorithm (top-down). Even though it does not provide built-in precedence like Parse::Yapp (bottom-up), it handles right and left associativity without resorting to tricks. Additionally, it produces all possible parse trees when more than one are possible. This may or may not be a good thing if you are only after the "best".

While I came short of actually evaluating the expression, I am quite impressed with what Luke was able to put together in 1 week (literally). I did have some hangups with the POD and figuring out how to just tell the parser to consume all input as well as a few other things. I ended up abandoning it as there is no support for actions to either build your own tree or evaluate on the fly. The resulting tree may be easy to walk for evaluation but it seemed over my head.

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Parse::Earley; my $parser = Parse::Earley->new(); my $grammar = <<'__GRAMMAR__'; input: expr expr: expr '+' mul_term | expr '-' mul_term | mul_term mul_term: mul_term '*' exp | mul_term '/' exp | exp exp: term '^' exp | term term: '(' expr ')' | /\d+/ { $_ < 256 } __GRAMMAR__ my $str = '1 + 2 - 3'; $parser->grammar($grammar); $parser->start('input'); $parser->advance($str) for 1..6; my ($tree) = $parser->matches_all($str, 'input'); print Dumper($tree);

Here is where I will talk about the other parsers, theory of the dragon book, and my lessons learned. If I don't run out of steam by now I want to see if I can write my own config language. The purpose of which would be to describe your operators functions precedence levels etc. The first pass would be to parse this config language and constructs the grammar to parse the language itself. This way, adding operators or precedence levels would only involve updating a config file.

Here is where I will draw my conclusions.

Ok, so I didn't actually finish the project but I did draw some conclusions. In no particular order:

People giving advice don't always listen:
When I indicated I was going to be writing a mathematical expression evaluator, I was told quite emphatically that Parse::RecDescent was the right tool for the job by several monks I respect. No warnings were given that handling precedence and associativity was tricky. Additionally, they seemed to be focused on the parsing part and not on the evaluation part. Creating a parse tree is one thing, using it is quite a different thing. I am quite convinced that Parse::RecDescent was the wrong tool for the job.

People giving advice are not always well informed:
I was told that Parse::RecDescent and Parse::Yapp were pretty much equal despite tackling the same problem in different ways (right-recursion vs left-recursion). From my limited experience, this is just plain wrong. If you need to handle precedence and/or associativity, Parse::Yapp is far superior.

Existing wheels don't always fit your car:
I did not try all the existing wheels I listed above but the ones I did try were clumsy and cumbersome. My handrolled version did exactly what I wanted. It took me less than an hour to write while the others took much longer.

You can learn a lot from breaking the rules:

Readmore tags added per jdporter's good suggestion.

Cheers - L~R

Replies are listed 'Best First'.
Re: Breaking The Rules II
by Ovid (Cardinal) on Jul 02, 2007 at 15:14 UTC

    Sorry I never got back to you with a HOP-Parser example. Since Dominus said he was going to make the book available for free, I figured that eventually I could just point people to that for documentation since there's a heck of a lot to document. I do have a parsing example at Converting words to numbers with HOP::Parser. Maybe that could be a start?

    Oh, and the AI::Prolog math prepocessor is pretty much "for internal use only" :)

    You could also look at the code for Object::Relation::Parser. It uses the HOP::Parser internally. There's a brief explanation of it at Object::Relation::Parser::Overview. The latter documentation explains the grammar and some of the tests with that can help. Of course, I didn't write that as a tutorial, so it might not help that much :/


    New address of my CGI Course.

      Thank you. I am not going to put this back on my to-do list. I posted unfinished because I didn't really see myself pursuing all the things to complete the meditation. On the other hand, the next time I need a parser for something that Parse::Yapp doesn't solve for me very easily, I will look back into HOP::Parser and the links you have provided.

      Cheers - L~R

Re: Breaking The Rules II
by diotalevi (Canon) on Jul 02, 2007 at 14:27 UTC

    That's interesting. I've "used" Parse::RecDescent but never stressed it. I got a basic intro to Parrot's compiler tools last week. I'm curious now how that would fare. You get to write a grammar in Perl 6, annotate your productions with associativity information and other stuff. This goes right to an AST which you can register events against. From there, you produce an opcode syntax tree and from there you get byte code.

    This is a tool aimed at this kind of task, y'know? It ought to be decent.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      The problems I encountered with Parse::RecDescent have been solved and then some with Perl 6 rules. Unfortunately, this meditation is more than a year old and the state of things then was far worse than it is now.

      Cheers - L~R

Re: Breaking The Rules II
by Rhandom (Curate) on Jul 02, 2007 at 15:36 UTC
    There are a few other options to place under existing wheels. Each of the mini-language based templating languages have expression parsing. Unfortunately most of the expression parsing isn't abstracted. There is at least one though, Template::Alloy (formerly known as CGI::Ex::Template) that does abstract the parsing of expressions, functions and operators. It does allow adding variables, functions, and custom operators.

    It seems that the point of your node was to show various attempts to get a working grammar going - more as an exercise. In that case, Template::Alloy is sort of besides the point. But if you were just wanting to find an existing parser that would work for your use case, then maybe it would help.

    I'm looking forward to Perl 6 and being able to use the existing grammar to parse expressions. I'll be interested to see how easy it will be to translate the parse tree into an easier to use AST.

    #!/usr/bin/perl use Template::Alloy; my $ta = Template::Alloy->new; $ta->_vars->{'func'} = sub { local $" = ', '; "You passed me (@_)" }; $ta->_vars->{'a'} = 3; $ta->_vars->{'b'} = 4; $ta->define_operator({ type => 'left', # actually associativiy precedence => 85, # same as '+' in $Template::Alloy::OPERATOR +S table symbols => ['+++'], play_sub => sub { int($_[0]) + int($_[1]) }, }); for my $expr ( '1 + 2', '1 + 2 * 3', '(1 + 2) * 3', '2.7e+10 * 3.0e-9', # scientific 'a + b', # variables 'func(1 + 2, 3)', # functions '1.2 +++ 3.4 * 2', # custom ops '1 ? 2 : 3', 'a + (b', '3 + 4 foobar', ) { my $copy = $expr; print "--------------------\n"; print "- expr: $expr\n"; print "- outp: ".eval{ $ta->play_expr($ta->parse_expr(\$copy)) }." +\n"; if ($@) { print "- err : $@\n"; } else { print "- err : Failed to consume entire string\n" if pos $copy + != length $expr; print "- tree: ".Template::Alloy->dump_parse_expr(\$expr)."\n" +; } }

    -------------------- - expr: 1 + 2 - outp: 3 - tree: [[undef, '+', 1, 2], 0] -------------------- - expr: 1 + 2 * 3 - outp: 7 - tree: [[undef, '+', 1, [[undef, '*', 2, 3], 0]], 0] -------------------- - expr: (1 + 2) * 3 - outp: 9 - tree: [[undef, '*', [[undef, '+', 1, 2], 0], 3], 0] -------------------- - expr: 2.7e+10 * 3.0e-9 - outp: 81 - tree: [[undef, '*', 27000000000, '3e-09'], 0] -------------------- - expr: a + b - outp: 7 - tree: [[undef, '+', ['a', 0], ['b', 0]], 0] -------------------- - expr: func(1 + 2, 3) - outp: You passed me (3, 3) - tree: ['func', [[[undef, '+', 1, 2], 0], 3]] -------------------- - expr: 1.2 +++ 3.4 * 2 - outp: 7 - tree: [[undef, '+++', '1.2', [[undef, '*', '3.4', 2], 0]], 0] -------------------- - expr: 1 ? 2 : 3 - outp: 2 - tree: [[undef, '?', 1, 2, 3], 0] -------------------- - expr: a + (b - outp: - err : parse.missing.paren error - Missing close ) in group (At char +6) -------------------- - expr: 3 + 4 foobar - outp: 7 - err : Failed to consume entire string - tree: [[undef, '+', 3, 4], 0]

    my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: Breaking The Rules II
by zentara (Archbishop) on Jul 02, 2007 at 16:12 UTC
    I for one, enjoy reading stuff that goes way over my head. It reminds of how deep the water actually is. :-)

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Breaking The Rules II
by tsee (Curate) on Jul 02, 2007 at 15:16 UTC

    Quite interesting!

    On a side note: When I wrote Math::Symbolic::Parser(::Yapp), I stumbled upon a similar essay at . The author focuses more on the performance aspects of the available parsers or rather parser generators, but he also talks about the amount of work required to get going.


      I forgot to mention how you'd actually use that specific "wheel" to evaluate expressions. I know it's not the point of this meditation to just use the existing tools, but comparison might still be sensible.

      #!/usr/bin/perl use strict; use warnings; use Math::Symbolic; my $parser = Math::Symbolic::Parser->new(); while (1) { print "\n> "; my $line = <STDIN>; my $tree = $parser->parse($line); print("Parse error.\n"), next if not defined $tree; print "Entered: $tree\n"; my $result = $tree->value(); print("Evaluation error.\n"), next if not defined $result; print "Result: $result\n"; }

      The output would look like this:

      > 3*4.2e3+5^(3.1+2) Entered: (3 * 4.2e3) + (5 ^ (3.1 + 2)) Result: 16270.6841971501 > a*b+c Entered: (a * b) + c Evaluation error.

      That last bit shows that the parser is actually intended to be used with variables, too. So it's not a parse error but an evaluation error.

      Switching to using the YAPP based parser is rather straightforward, too. Just add "implementation => 'Yapp'" to the call to the constructor.


Re: Breaking The Rules II
by BrowserUk (Patriarch) on Jul 02, 2007 at 15:51 UTC

    I am so glad that you have posted this, even in its incomplete state.

    Maybe. Just maybe. It will stop the "Ew. The post contains the words 'parse', so it must be a job for Parse::RecDescent!" crowd from wobbling their collective gums quite so frequently.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      To be fair, while it's not a panacea, Parse::RecDescent is actually a pretty good tool for exactly this kind of job. When I recently tried a similar exercise myself, my first readthrough of the manual left me sceptical, but in practice I was pleasantly surprised at how easy it turned out to be to parse expressions of this precise sort with Parse:RecDescent. Whereas I've never managed to get my head round LALR(1) parsing of the sort that Limbic~Region found so intuitive.

      Though I suppose that just reinforces the point that one shouldn't assume that a tool is the right tool for one's own project just because other people prefer it...

      (I do find myself wondering whether the love affair with Yapp would have continued beyond the first shift/reduce conflict...)

        I wasn't targetting P::RD. It is a perfectly fine module for those situations where you need to extract full semantic information from the language you are analysing. But even for this, it's certainly not the only game in town, nor necessarily the best choice for any given application.

        With respect to shift/reduce conflicts and Parse::YAPP: It's possible to construct ambiguous grammars regardless of which type of parser one targets, and equally possible to resolve them.

        My main point was that parsers in general aren't an easy to learn and use, alternative to regex. Especially when a lot of the time when people say: I want to parse ...; they often don't want to parse at all. They simply want to extract some information, that may happen to be embedded within some other information.

        For example, for the vast majority of screen scraping applications, the user has no interest whatsoever in extracting any semantic or syntactic information from the surrounding text. Even if that surrounding text happens to be in a form that may or may not comply with one of the myriad variations of some gml-like markup.

        Their only interest is locating a specific piece of text that happens to be embedded within a lot of other text. There may be some clues in that other text that they will need to locate the text they are after, but they couldn't give two hoots whether that other text is self-consistant with some gml/html/xhtml standard.

        For this type of application, not only does parsing the surrounding html require a considerable amount of effort and time--both programmer time and processor time--given the flexibility of browsers to DWIM with badly written HTML/XTML, it would often set the programmer on a hiding to nothing to even try. Luckily, HTML::Parser and freinds are pragmatically and specifically written to gloss over the finer points of those standards and operate in a manner that DWTALTPWAMs (Do What The Average, Less Than Perfect, Web Author Means).

        Even so, after 5 years, I have still to see any convincing argument against the opinions I expressed when I wrote Being a heretic and going against the party line.. I still find it far quicker to use a 'bunch of regex' to extract the data I want from the average, subject-to-change, website than to work out which combination of modules and methods are required to 'do it properly'. And when things change, I find it easier to adjust the regex than figure out which other module or modules and methods I now require.

        I think that there is an element of 'laziness gone to far' in the dogma that regex is "unreadable, unmaintainable and hard". It is a complex tool with complex rules, just as every parser out there. You have to learn to use it, just as with every other parsing tool out there. It has limitations just like every other parser out there.

        And there are several significant advantages of learning to use regex, over every other parsing tool out there.

        1. It's always available.
        2. It is applicable to every situation.

          Left recursive; right recursive; top down; bottom up; nibbling; lookahead; maximal chunk; whatever.

        3. You have complete control.

          Need to perform some program logic part way through a parse? No problem, use /gc and while.

          Need to parse a datstream on the fly. No problem, same technique applies.

          Want to just skip over stuff that doesn't matter to your application. No problem. Parse what you need to, skip over what you don't. You don't have to cater for all eventualities, nor restrict yourself to dealing with data that complies to some formalised, published set of rules.

        4. It's fast.

        Mostly, take the time to learn to use regex well and you'll not need to run off to cpan to grab, and spend time learning to use, one of ten new modules, each of which purport to do what you need, but each of which has its own set of limitations and caveats.

        I have a regex based parser for math expressions, with precedence and identifiers, assignment and variadic functions. It's all of 60 lines including the comprehensive test suite! One day I'll get around to cleaning it up and posting it somewhere.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Breaking The Rules II
by casiano (Pilgrim) on Dec 02, 2007 at 13:55 UTC
    Thanks for this meditation. Very useful to me. Try also Parse::Eyapp. It fixes some of the limitations of Parse::Yapp