Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^3: Perl Parsing Based on Supplied Precedence

by wirito (Acolyte)
on Nov 07, 2012 at 10:52 UTC ( #1002653=note: print w/ replies, xml ) Need Help??

in reply to Re^2: Perl Parsing Based on Supplied Precedence
in thread Perl Parsing Based on Supplied Precedence

In fact, I find this pretty interesting:

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence_perlop=[ qr/(?:\/|\*|\%|x)/, qr/(?:\+|-|\.)/, qr/(?:<=|>=|<|>lt|gt|le|ge)/, qr/&/, qr/(?:\||\^)/, qr/&&/, qr/(?:\|\||\/\/)/, qr/(not)/, qr/(and)/, qr/(or|xor)/ ]; sub parse{ my ($regex,$input)=@_; $input=~s/\s//g; for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/){ my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; } sub evaluate { my $tree = shift; return $tree unless ref $tree; foreach my $op ( keys %{$tree} ) { my @terms = map { evaluate($_) } @{ $tree->{$op} }; my $result; eval "\$result = $terms[0] $op $terms[1]"; return undef if $@; return $result; } } while(<>){ my $tree = parse($precedence_perlop,$_); print Dumper($tree); my $ev = evaluate( $tree ); print "Evaluate to: " . $ev . "\n" if defined $ev; }

Comment on Re^3: Perl Parsing Based on Supplied Precedence
Download Code
Re^4: Perl Parsing Based on Supplied Precedence
by protist (Monk) on Nov 07, 2012 at 11:55 UTC

    That is cool. :) I may come back to this parser and make something "smarter" or easier to play with.

Re^4: Perl Parsing Based on Supplied Precedence
by roboticus (Canon) on Nov 07, 2012 at 16:23 UTC


    That was an amusing diversion. I played around with it and added a couple of features for fun:

    • Parenthesis now work for grouping
    • Now we have variables
    • It emits the RPN version of the expression

    I tried monkeying with regexes to add parenthesis, and then remembered Text::Balanced, so I used that to extract the subexpressions. The parsed subexpressions are then stored in the %temps hash. The expression is then rewritten, replacing the subexpression with the temp name. Finally, when we wind up with a single token (variable, value or temp name), we check the temps hash, and if we get a match, we return the parse tree we stowed away there.

    In order to prevent collisions between temp names and variable names, I map the expression to lower case before parsing. So I split the parse function into two bits.

    I built a to_RPN() function by whacking on your original evaluate() routine. Then I made an evaluator for the RPN string. I didn't update your evaluate() function, so I removed it for now. It's a simple modification to bring it up-to-date with the variable assignments, but I gotta get back to work.

    Finally, I handle variables by simply looking at any scalar I pop from the stack to see if it has any alpha characters and if it's in the %vars hash. If so, I look up the appropriate value.

    There's still plenty of room for cleanup, simplification and such, but I thought I'd stop for now. I hope someone finds it amusing...

    A sample run gives:

    $ ./ a:=1*(2+(3/5+2)) RPN: a 1 2 3 5 / 2 + + * := (a set to <4.6>) ====> 4.6 Variables: a:4.6 b:=a+15 RPN: b a 15 + := (b set to <19.6>) ====> 19.6 Variables: a:4.6, b:19.6 c:=(b-a)*(5+3+(9-6)*3) RPN: c b a - 5 3 + 9 6 - 3 * + * := (c set to <255>) ====> 255 Variables: a:4.6, b:19.6, c:255


    When your only tool is a hammer, all problems look like your thumb.

      I was playing to add parenthesis too. But as I have a 'difficult' access to CPAN I didn't tried Text::Balanced. However I reach the regex monkey way to deal with it, just adding a two globals and a new loop on parse():
      my $par; my $par_count = 0; sub parse{ my ($regex,$input)=@_; $input =~ /^__PAREN(\d+)__$/ and return $par->{$1}; $input=~s/\s//g; while( $input =~ /\(([^()]+)\)/ ) { my $sub = $1; $par->{$par_count} = parse($regex, $sub); my $tag = '__PAREN' . $par_count++ . '__'; $input =~ s/\(([^()]+)\)/$tag/; } for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/) { my ($before,$op,$after,$node) = ($1, $2, $3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; }
      Your variable handling has amazed me :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002653]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-09-20 11:59 GMT
Find Nodes?
    Voting Booth?

    How do you remember the number of days in each month?

    Results (159 votes), past polls