Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Operator Precedence Parser

by hv (Parson)
on Jun 10, 2006 at 08:51 UTC ( #554597=note: print w/replies, xml ) Need Help??

in reply to Operator Precedence Parser

I don't know if I triggered L~R's hunt, but I was asking in the CB about modules for expression parsing and he expressed an interest.

I was looking for something that would be supplied with a) the string to parse, and b) the parse rules - a set of functions, binary and unary operators, and atoms. Something like this:

Expr::Parse->new({ function => [ { name => 'p', type => 'bool', args => [ 'int' ], test => sub { _is_prime($_[1]) }, }, { name => 'rev', type => 'int', args => [ 'int' ], test => sub { scalar reverse $_[1] }, }, ], binop => [ { name => '+', type => 'int', args => [ 'int', 'int' ], prec => 4, test => sub { $_[1] + $_[2] }, }, { name => '*', type => 'int', args => [ 'int', 'int' ], prec => 3, test => sub { $_[1] * $_[2] }, }, { name => '=', type => 'bool', args => [ 'int', 'int' ], prec => 6, test => sub { $_[1] == $_[2] }, }, unop => [ { name => '!', type => 'bool', args => [ 'bool' ], prec => 1, test => sub { $_[1] ? 0 : 1 }, }, ], atom => [ { name => 'const', pat => '\d+', type => 'int', test => sub { $_[1] }, }, ], });

Nobody in the CB could suggest anything at the time, so I wrote my own constructing a Tree::Simple tree as output. I found it surprisingly hard to write, even treating '(', ',' and ')' as builtins (for the function(arg, list) support), and ignoring the type information - I wrote and threw away hundreds of lines of code in at least half a dozen trial implementations before finally coming up with something workable if not particularly pretty.

If your parser can easily be extended to handle function calls I'll have a go at adapting it for my application.


Replies are listed 'Best First'.
Re^2: Operator Precedence Parser
by bart (Canon) on Jun 11, 2006 at 16:28 UTC
    If your parser can easily be extended to handle function calls I'll have a go at adapting it for my application.
    I said it would be easy, didn't I? Well I've got to put my money (er, code) where my mouth is, so I adapted it to process function calls. With this global spec, which defines 2 functions:
    my %function = ( sumsq => sub { my $sum = 0; foreach(@_) { $sum += $_*$_; } return $ +sum; }, # sum of squares sqrt => sub { return sqrt shift; }, );
    I added this piece in parse_value(), just in front of the code to handle variables:
    if(/\G((?i:[a-z]\w*))\s*\(/gc) { # function '(' my $function = $1; $function{$function} or die sprintf "Undefined function '$func +tion' called at: \"%s\"", where(); my @arg; unless(/\G\s*(?=\))/gc) { while(1){ my($value) = parse_expr() or die sprintf "Expression e +xpected at: \"%s\"", where(); push @arg, $value; /\G\s*,/gc or last; } } /\G\s+/gc; /\G\)/gc or die sprintf "Parse error: ')' expected at: \"%s\"" +, where(); trace(sprintf "function '$function' called with %d argument%s" +, scalar @arg, @arg==1 ? "" : "s"); return $function{$function}->(@arg); }
    and with the data string "sumsq(3,2+2)*sqrt(36)", the output is:
    Line 29 "·sumsq(3,2+2)*sqrt(36)" Line 29 "sumsq(·3,2+2)*sqrt(36)" Line 31 "sumsq(3·,2+2)*sqrt(36)" value=3 Line 29 "sumsq(3,·2+2)*sqrt(36)" Line 31 "sumsq(3,2·+2)*sqrt(36)" value=2 Line 36 "sumsq(3,2+·2)*sqrt(36)" op=+ Line 29 "sumsq(3,2+·2)*sqrt(36)" Line 31 "sumsq(3,2+2·)*sqrt(36)" value=2 Line 43 "sumsq(3,2+2·)*sqrt(36)" popping 2 + Line 45 "sumsq(3,2+2·)*sqrt(36)" result = 4 Line 85 "sumsq(3,2+2)·*sqrt(36)" function 'sumsq' called with 2 argume +nts Line 31 "sumsq(3,2+2)·*sqrt(36)" value=25 Line 36 "sumsq(3,2+2)*·sqrt(36)" op=* Line 29 "sumsq(3,2+2)*·sqrt(36)" Line 29 "sumsq(3,2+2)*sqrt(·36)" Line 31 "sumsq(3,2+2)*sqrt(36·)" value=36 Line 85 "sumsq(3,2+2)*sqrt(36)·" function 'sqrt' called with 1 argumen +t Line 31 "sumsq(3,2+2)*sqrt(36)·" value=6 Line 43 "sumsq(3,2+2)*sqrt(36)·" popping 25 * Line 45 "sumsq(3,2+2)*sqrt(36)·" result = 150 150 Stack: This value is never affected

    It seems to work fine, was finished in something like 1/4 hour, and it definitely doesn't take hundreds of lines of code. :)

Re^2: Operator Precedence Parser
by bart (Canon) on Jun 13, 2006 at 11:17 UTC
    I wrote my own constructing a Tree::Simple tree as output.
    For completeness sake, I've rewritten the parser so it produces a parse tree. I took a quick look at Tree::Simple, and I found it too hard to my taste for the little benefit it would give me, so I'm using a handrolled function object instead — yes I'm converting the infix operators into prefix function calls. As an extra benefit, I can use overload to return a symbolic representation of the parse tree when used as a string, or actually evaluate it, when used as a number.

    I think it clearly demontrates its viability as a parser for real work.

    update Now with symbolic (postponed) variables, meaning: if you change the value of the variable in the %var hash, the value used in an evaluation by using a parsed expression in a numerical context, will change accordingly.


      Very nice code, much simpler & shorter than I expected. I tried to add an = (Perl or C internal-to-expression assignment) to the operators, though, lowest precedence & right-associative:

      '='=> {prec=> 5,assoc=>'R',exec=>sub { print "DEBUG $_[0]=$_[1]\n"; $_[1]; }, function=>'asg'},

      I added it to the operator pattern:

      if (/\G\s*(\*\*|[=+\-*\/%\\])/gc)

      and kind of defined the LHS variable


      but it fails because Function::evaluate tries to coerce the value, which I set to the variable name, into a number:

      return $code->(map 0+$_,@{$self->{arguments}});

      I think I have a workaround, but it looks like '=' is a special case because the LHS variable should not be evaluated before calling the assignment function.

Re^2: Operator Precedence Parser
by Rhandom (Curate) on Jun 12, 2006 at 13:03 UTC
    CGI::Ex::Template does all of this.

    You can add arbitrary entries to the $OPERATORS table and then use the built in functions to rebuild the global qr's helping in the parsing.

    The parse_expr method takes a reference to a string (which it will consume) and returns a parsed optree.

    The play_expr method takes the optree and actually executes it.

    The parser allows for TT2 style nested variables, function calls, number, double and single quoted strings, arrays and hashes. It also has proper precedence and associativity parsing.

    The entry in the $OPERATORS table should contain the following:
    type: prefix, postfix, left, right, none, ternary, or assign (right +) precedence: the relative precedence value symbols or names: an arrayref of symbols or operators for that oper +ation function: a code ref to run when that operator is found.

    It isn't entirely general as the variable names are TT2'ish - but it certainly could be used in other applications.

    my @a=qw(random brilliant braindead); print $a[rand(@a)];

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://554597]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2017-08-18 07:08 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (295 votes). Check out past polls.