Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Perl: the Markov chain saw
 
PerlMonks  

Perl Parsing Based on Supplied Precedence

by protist (Monk)
on Nov 05, 2012 at 15:40 UTC ( #1002349=CUFP: print w/ replies, xml ) Need Help??

I was playing with some recursive parsing, and made a simple program to parse apart text based on a given list of precedence. I was quite pleased with the result. :)

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence=[qr/&&/,qr/\|\|/,qr/\+/,qr/-/,qr/\//,qr/\*/]; sub parse; while(<>){ my $tree = parse($precedence,$_); print Dumper($tree); } sub parse{ my ($regex,$input)=@_; my ($node,$before,$op,$after); $input=~s/\s//g; for(@$regex){ $input=~m/(.+)($_)(.+)/; if($3){ ($before,$op,$after)=($1,$2,$3); last; } } return $input if !defined($after); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; }
input
5+4*3&&COND||COND2||COND3&&ANOTHER
output
{ '&&' => [ { '&&' => [ { '+' => [ '5', { '*' => [ '4', '3' ] } ] }, { '||' => [ { '||' => [ 'COND', 'COND2' ] }, 'COND3' ] } ] }, 'ANOTHER' ] }

Comment on Perl Parsing Based on Supplied Precedence
Select or Download Code
Re: Perl Parsing Based on Supplied Precedence
by wirito (Acolyte) on Nov 06, 2012 at 16:01 UTC
    I think this is not treating 'equal preference' operators right. The expresion 16/4*2 should be considered as (16/4)*2, but your code produces 16/(4*2):
    $ perl parser.pl 16/4*2 { '/' => [ '16', { '*' => [ '4', '2' ] } ] }
    A simple tweak of the precedence regex:
    my $precedence=[qr/(?:&&|\|\|)/,qr/(?:\+|-)/,qr/(?:\/|\*)/];
    Will produce a better output:
    { '*' => [ { '/' => [ '16', '4' ] }, '2' ] }
    Edit: a further reading at perlop show me that '&&' precedes over '||', so regex should be:
    my $precedence=[qr/(?:\/|\*)/,qr/(?:\+|-)/,qr/&&/,qr/|\|\|/];

      Your change in the math operator precedence looks good. I did not test the logic of the math operators so much as that the precedence supplied was correctly represented in the tree. My "&&", as described by precedence, is closer to "and" (in Perl). Keep in mind, that the idea here was not to properly parse Perl, but to properly parse grammars (in general) by defined operator precedence.

      Thank you for your feedback, I'm happy to see that someone was interested enough to play with it. :)

        With boolean operators there is the same issue. '&&' and 'and' operator has higher precedence that '||' and 'or' (check perlop link on my first post). My approach gives accurate results.
        Then 'COND1&&COND2||COND3' is the same as '(COND1&&COND2)||COND3':
        $ perl parser.pl COND1&&COND2||COND3 ORIG: { '&&' => [ 'COND1', { '||' => [ 'COND2', 'COND3' ] } ] } NEW: { '||' => [ { '&&' => [ 'COND1', 'COND2' ] }, 'COND3' ] }
        In fact, I find this pretty interesting:
        #!/usr/bin/perl use warnings; use strict; use Data::Dumper; $Data::Dumper::Terse=1; my $precedence_perlop=[ qr/(?:\/|\*|\%|x)/, qr/(?:\+|-|\.)/, qr/(?:<=|>=|<|>lt|gt|le|ge)/, qr/&/, qr/(?:\||\^)/, qr/&&/, qr/(?:\|\||\/\/)/, qr/(not)/, qr/(and)/, qr/(or|xor)/ ]; sub parse{ my ($regex,$input)=@_; $input=~s/\s//g; for(reverse @$regex){ if($input=~m/(.+)($_)(.+)/){ my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before),parse($regex,$after)]; return $node; } } return $input; } sub evaluate { my $tree = shift; return $tree unless ref $tree; foreach my $op ( keys %{$tree} ) { my @terms = map { evaluate($_) } @{ $tree->{$op} }; my $result; eval "\$result = $terms[0] $op $terms[1]"; return undef if $@; return $result; } } while(<>){ my $tree = parse($precedence_perlop,$_); print Dumper($tree); my $ev = evaluate( $tree ); print "Evaluate to: " . $ev . "\n" if defined $ev; }
Re: Perl Parsing Based on Supplied Precedence
by grizzley (Chaplain) on Nov 07, 2012 at 10:01 UTC
    You can simplify code a little bit by moving $node declaration to the inner loop (and I replaced condition $3 with match result):
    sub parse { my ($regex,$input)=@_; for(@$regex) { if($input=~m/(.+)($_)(.+)/) { my ($before,$op,$after,$node)=($1,$2,$3); $node->{$op}=[parse($regex,$before), parse($regex,$after)] +; return $node; } } return $input; }
    and optimize the code by moving $input=~s/\s//g outside parse sub. It is enough to do this once in while loop after reading input data.

      That is very slick. I was worried about checking for a match then doing a match with groups being two regex operations, but for some reason didn't think to do both at once as you did.

      +1

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1002349]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (13)
As of 2014-04-20 15:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls