http://www.perlmonks.org?node_id=1002349

I was playing with some recursive parsing, and made a simple program to parse apart text based on a given list of precedence. I was quite pleased with the result. :)

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence=[qr/&&/,qr/\|\|/,qr/\+/,qr/-/,qr/\//,qr/\*/]; sub parse; while(<>){ my $tree = parse($precedence,$_); print Dumper($tree); } sub parse{ my ($regex,$input)=@_; my ($node,$before,$op,$after); $input=~s/\s//g; for(@$regex){ $input=~m/(.+)($_)(.+)/; if($3){ ($before,$op,$after)=($1,$2,$3); last; } } return $input if !defined($after); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; }
input
5+4*3&&COND||COND2||COND3&&ANOTHER
output
{ '&&' => [ { '&&' => [ { '+' => [ '5', { '*' => [ '4', '3' ] } ] }, { '||' => [ { '||' => [ 'COND', 'COND2' ] }, 'COND3' ] } ] }, 'ANOTHER' ] }

Replies are listed 'Best First'.
Re: Perl Parsing Based on Supplied Precedence
by wirito (Acolyte) on Nov 06, 2012 at 16:01 UTC
    I think this is not treating 'equal preference' operators right. The expresion 16/4*2 should be considered as (16/4)*2, but your code produces 16/(4*2):
    $ perl parser.pl 16/4*2 { '/' => [ '16', { '*' => [ '4', '2' ] } ] }
    A simple tweak of the precedence regex:
    my $precedence=[qr/(?:&&|\|\|)/,qr/(?:\+|-)/,qr/(?:\/|\*)/];
    Will produce a better output:
    { '*' => [ { '/' => [ '16', '4' ] }, '2' ] }
    Edit: a further reading at perlop show me that '&&' precedes over '||', so regex should be:
    my $precedence=[qr/(?:\/|\*)/,qr/(?:\+|-)/,qr/&&/,qr/|\|\|/];

      Your change in the math operator precedence looks good. I did not test the logic of the math operators so much as that the precedence supplied was correctly represented in the tree. My "&&", as described by precedence, is closer to "and" (in Perl). Keep in mind, that the idea here was not to properly parse Perl, but to properly parse grammars (in general) by defined operator precedence.

      Thank you for your feedback, I'm happy to see that someone was interested enough to play with it. :)

        In fact, I find this pretty interesting:
        #!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence_perlop=[ qr/(?:\/|\*|\%|x)/, qr/(?:\+|-|\.)/, qr/(?:<=|>=|<|>lt|gt|le|ge)/, qr/&/, qr/(?:\||\^)/, qr/&&/, qr/(?:\|\||\/\/)/, qr/(not)/, qr/(and)/, qr/(or|xor)/ ]; sub parse{ my ($regex,$input)=@_; $input=~s/\s//g; for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/){ my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; } sub evaluate { my $tree = shift; return $tree unless ref $tree; foreach my $op ( keys %{$tree} ) { my @terms = map { evaluate($_) } @{ $tree->{$op} }; my $result; eval "\$result = $terms[0] $op $terms[1]"; return undef if $@; return $result; } } while(<>){ my $tree = parse($precedence_perlop,$_); print Dumper($tree); my $ev = evaluate( $tree ); print "Evaluate to: " . $ev . "\n" if defined $ev; }
        With boolean operators there is the same issue. '&&' and 'and' operator has higher precedence that '||' and 'or' (check perlop link on my first post). My approach gives accurate results.
        Then 'COND1&&COND2||COND3' is the same as '(COND1&&COND2)||COND3':
        $ perl parser.pl COND1&&COND2||COND3 ORIG: { '&&' => [ 'COND1', { '||' => [ 'COND2', 'COND3' ] } ] } NEW: { '||' => [ { '&&' => [ 'COND1', 'COND2' ] }, 'COND3' ] }
Re: Perl Parsing Based on Supplied Precedence
by grizzley (Chaplain) on Nov 07, 2012 at 10:01 UTC
    You can simplify code a little bit by moving $node declaration to the inner loop (and I replaced condition $3 with match result):
    sub parse { my ($regex,$input)=@_; for(@$regex) { if($input=~m/(.+)($_)(.+)/) { my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before), parse($regex,$after)] +; return $node; } } return $input; }
    and optimize the code by moving $input=~s/\s//g outside parse sub. It is enough to do this once in while loop after reading input data.

      That is very slick. I was worried about checking for a match then doing a match with groups being two regex operations, but for some reason didn't think to do both at once as you did.

      +1