Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^4: Perl Parsing Based on Supplied Precedence

by roboticus (Chancellor)
on Nov 07, 2012 at 16:23 UTC ( #1002689=note: print w/replies, xml ) Need Help??

in reply to Re^3: Perl Parsing Based on Supplied Precedence
in thread Perl Parsing Based on Supplied Precedence


That was an amusing diversion. I played around with it and added a couple of features for fun:

  • Parenthesis now work for grouping
  • Now we have variables
  • It emits the RPN version of the expression

I tried monkeying with regexes to add parenthesis, and then remembered Text::Balanced, so I used that to extract the subexpressions. The parsed subexpressions are then stored in the %temps hash. The expression is then rewritten, replacing the subexpression with the temp name. Finally, when we wind up with a single token (variable, value or temp name), we check the temps hash, and if we get a match, we return the parse tree we stowed away there.

In order to prevent collisions between temp names and variable names, I map the expression to lower case before parsing. So I split the parse function into two bits.

I built a to_RPN() function by whacking on your original evaluate() routine. Then I made an evaluator for the RPN string. I didn't update your evaluate() function, so I removed it for now. It's a simple modification to bring it up-to-date with the variable assignments, but I gotta get back to work.

Finally, I handle variables by simply looking at any scalar I pop from the stack to see if it has any alpha characters and if it's in the %vars hash. If so, I look up the appropriate value.

There's still plenty of room for cleanup, simplification and such, but I thought I'd stop for now. I hope someone finds it amusing...

$ cat #!/usr/bin/perl use warnings; use strict; use Data::Dumper; use Scalar::Util qw( looks_like_number ); use Text::Balanced qw( extract_bracketed ); $Data::Dumper::Terse=1; my $precedence_perlop=[ qr/(?:\/|\*|\%|x)/, qr/(?:\+|-|\.)/, qr/(?:<=|>=|<|>lt|gt|le|ge)/, qr/&/, qr/(?:\||\^)/, qr/&&/, qr/(?:\|\||\/\/)/, qr/(not)/, qr/(and)/, qr/(or|xor)/, qr/(?::=)/, ]; my %vars; my %temps; my $last_temp="TEMP000"; sub parse { # Map all variable names to lower case, so temp values in upper ca +se # won't collide my ($regex,$input)=@_; $input = lc($input); # Reset temporaries $last_temp="TEMP000"; return parse_helper($regex, $input); } sub parse_helper{ my ($regex,$input)=@_; $input=~s/\s//g; # parse subexpressions into temps if ($input=~m/(.*?)(\(.*)/) { my $cur_temp = $last_temp++; # get next temp # Split into "$before ($subexpr) $after" my ($before, $nested, $node) = ($1, $2); my ($subexpr, $after) = Text::Balanced::extract_bracketed($nested, +"()"); $subexpr = substr($subexpr,1,length($subexpr)-2); # clip parens # Parse & store temporary variable $temps{$cur_temp} = parse_helper($regex, $subexpr); # Rewrite expression with temporary variable name $input = $before . $cur_temp . $after; } # parse as usual for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/){ my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse_helper($regex,$before),parse_helper($regex,$ +after)]; return $node; } } # Return temporary expression tree or scalar value if DNE return exists $temps{$input} ? $temps{$input} : $input; } sub to_RPN { my $tree = shift; return $tree unless ref $tree; foreach my $op ( keys %{$tree} ) { my @terms = map { to_RPN($_) } @{ $tree->{$op} }; return join(" ", @terms, $op); } } sub evaluate_RPN { my $RPN = shift; my @stack; my $pop = sub { die "EMPTY STACK! (malformed expression...)" unless @s +tack; my $t = pop @stack; if ($t =~ /^[a-z][a-z0-9]*$/) { $t = $vars{$t} // 0; } return $t; }; for (split /\s+/, $RPN) { if ($_ =~ /^[a-z0-9.]*$/) { # Value or variable name push @stack, $_; } elsif ($_ eq ':=') { # Variable assignment my $value = $pop->(); my $varname = pop @stack; $vars{$varname} //= $value; print " ($varname set to <$value>)"; return $value; } else { my $R = $pop->(); #pop @stack; my $L = $pop->(); #pop @stack; push @stack, eval "$L $_ $R"; } } die "EXTRA JUNK ON STACK! (malformed expression)" unless @stack == 1 +; return $stack[-1]; } while(<>){ s/^\s+//; s/\s+$//; my $tree = parse($precedence_perlop,$_); my $RPN = to_RPN($tree); print "RPN: $RPN"; print " ====> ", evaluate_RPN($RPN), "\n"; print "Variables: ", join(", ", map { "$_:$vars{$_}" } sort keys %va +rs), "\n"; }

A sample run gives:

$ ./ a:=1*(2+(3/5+2)) RPN: a 1 2 3 5 / 2 + + * := (a set to <4.6>) ====> 4.6 Variables: a:4.6 b:=a+15 RPN: b a 15 + := (b set to <19.6>) ====> 19.6 Variables: a:4.6, b:19.6 c:=(b-a)*(5+3+(9-6)*3) RPN: c b a - 5 3 + 9 6 - 3 * + * := (c set to <255>) ====> 255 Variables: a:4.6, b:19.6, c:255


When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^5: Perl Parsing Based on Supplied Precedence
by wirito (Acolyte) on Nov 07, 2012 at 16:42 UTC
    I was playing to add parenthesis too. But as I have a 'difficult' access to CPAN I didn't tried Text::Balanced. However I reach the regex monkey way to deal with it, just adding a two globals and a new loop on parse():
    my $par; my $par_count = 0; sub parse{ my ($regex,$input)=@_; $input =~ /^__PAREN(\d+)__$/ and return $par->{$1}; $input=~s/\s//g; while( $input =~ /\(([^()]+)\)/ ) { my $sub = $1; $par->{$par_count} = parse($regex, $sub); my $tag = '__PAREN' . $par_count++ . '__'; $input =~ s/\(([^()]+)\)/$tag/; } for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/) { my ($before,$op,$after,$node) = ($1, $2, $3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; }
    Your variable handling has amazed me :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002689]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (11)
As of 2018-03-17 17:41 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (225 votes). Check out past polls.