Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Perl Parsing Based on Supplied Precedence

by wirito (Acolyte)
on Nov 06, 2012 at 16:01 UTC ( #1002510=note: print w/replies, xml ) Need Help??

in reply to Perl Parsing Based on Supplied Precedence

I think this is not treating 'equal preference' operators right. The expresion 16/4*2 should be considered as (16/4)*2, but your code produces 16/(4*2):
$ perl 16/4*2 { '/' => [ '16', { '*' => [ '4', '2' ] } ] }
A simple tweak of the precedence regex:
my $precedence=[qr/(?:&&|\|\|)/,qr/(?:\+|-)/,qr/(?:\/|\*)/];
Will produce a better output:
{ '*' => [ { '/' => [ '16', '4' ] }, '2' ] }
Edit: a further reading at perlop show me that '&&' precedes over '||', so regex should be:
my $precedence=[qr/(?:\/|\*)/,qr/(?:\+|-)/,qr/&&/,qr/|\|\|/];

Replies are listed 'Best First'.
Re^2: Perl Parsing Based on Supplied Precedence
by protist (Monk) on Nov 07, 2012 at 03:33 UTC

    Your change in the math operator precedence looks good. I did not test the logic of the math operators so much as that the precedence supplied was correctly represented in the tree. My "&&", as described by precedence, is closer to "and" (in Perl). Keep in mind, that the idea here was not to properly parse Perl, but to properly parse grammars (in general) by defined operator precedence.

    Thank you for your feedback, I'm happy to see that someone was interested enough to play with it. :)

      In fact, I find this pretty interesting:
      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence_perlop=[ qr/(?:\/|\*|\%|x)/, qr/(?:\+|-|\.)/, qr/(?:<=|>=|<|>lt|gt|le|ge)/, qr/&/, qr/(?:\||\^)/, qr/&&/, qr/(?:\|\||\/\/)/, qr/(not)/, qr/(and)/, qr/(or|xor)/ ]; sub parse{ my ($regex,$input)=@_; $input=~s/\s//g; for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/){ my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; } sub evaluate { my $tree = shift; return $tree unless ref $tree; foreach my $op ( keys %{$tree} ) { my @terms = map { evaluate($_) } @{ $tree->{$op} }; my $result; eval "\$result = $terms[0] $op $terms[1]"; return undef if $@; return $result; } } while(<>){ my $tree = parse($precedence_perlop,$_); print Dumper($tree); my $ev = evaluate( $tree ); print "Evaluate to: " . $ev . "\n" if defined $ev; }


        That was an amusing diversion. I played around with it and added a couple of features for fun:

        • Parenthesis now work for grouping
        • Now we have variables
        • It emits the RPN version of the expression

        I tried monkeying with regexes to add parenthesis, and then remembered Text::Balanced, so I used that to extract the subexpressions. The parsed subexpressions are then stored in the %temps hash. The expression is then rewritten, replacing the subexpression with the temp name. Finally, when we wind up with a single token (variable, value or temp name), we check the temps hash, and if we get a match, we return the parse tree we stowed away there.

        In order to prevent collisions between temp names and variable names, I map the expression to lower case before parsing. So I split the parse function into two bits.

        I built a to_RPN() function by whacking on your original evaluate() routine. Then I made an evaluator for the RPN string. I didn't update your evaluate() function, so I removed it for now. It's a simple modification to bring it up-to-date with the variable assignments, but I gotta get back to work.

        Finally, I handle variables by simply looking at any scalar I pop from the stack to see if it has any alpha characters and if it's in the %vars hash. If so, I look up the appropriate value.

        There's still plenty of room for cleanup, simplification and such, but I thought I'd stop for now. I hope someone finds it amusing...

        A sample run gives:

        $ ./ a:=1*(2+(3/5+2)) RPN: a 1 2 3 5 / 2 + + * := (a set to <4.6>) ====> 4.6 Variables: a:4.6 b:=a+15 RPN: b a 15 + := (b set to <19.6>) ====> 19.6 Variables: a:4.6, b:19.6 c:=(b-a)*(5+3+(9-6)*3) RPN: c b a - 5 3 + 9 6 - 3 * + * := (c set to <255>) ====> 255 Variables: a:4.6, b:19.6, c:255


        When your only tool is a hammer, all problems look like your thumb.

        That is cool. :) I may come back to this parser and make something "smarter" or easier to play with.

      With boolean operators there is the same issue. '&&' and 'and' operator has higher precedence that '||' and 'or' (check perlop link on my first post). My approach gives accurate results.
      Then 'COND1&&COND2||COND3' is the same as '(COND1&&COND2)||COND3':
      $ perl COND1&&COND2||COND3 ORIG: { '&&' => [ 'COND1', { '||' => [ 'COND2', 'COND3' ] } ] } NEW: { '||' => [ { '&&' => [ 'COND1', 'COND2' ] }, 'COND3' ] }

        "&&" and "and" have differing precedences. "and" is lower precedence than "&&". The same goes for "||" and "or"; "or" has lower precedence than "||". "and" actually has lower precedence than "||", as I will demonstrate.

        code with "&&":

        perl -e 'if(0&& 0||1){print"hello\n"}'



        code with "and":

        perl -e 'if(0and 0||1){print"hello\n"}'

        (outputs nothing)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002510]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2018-04-22 07:22 GMT
Find Nodes?
    Voting Booth?