Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Parse::RecDescent and Dynamically Matched Subrule Repetition

by lhoward (Vicar)
on Jan 11, 2006 at 16:54 UTC ( #522495=perlquestion: print w/ replies, xml ) Need Help??
lhoward has asked for the wisdom of the Perl Monks concerning the following question:

I am dealing with a data format that has a Pascal string representation like element to it. i.e. an integer followed a number of elements indicated by the leading integer. I'd like to encapsulate this within a Parse::RecDescent grammar using some of the magic from dynamically matched rules together with subrule repetition, but can't get it to work. Any recommendations? Code is below.
#!/usr/bin/perl use strict; use warnings; use Test::More tests => 5; use Parse::RecDescent; my $p=Parse::RecDescent->new(q{ rec: int elem("$item[1]") int: /\d+/ elem: /\S+/ }); ok($p->rec('0')); ok($p->rec('1 foo')); ok($p->rec('2 foo bar')); ok(!$p->rec('1')); ok(!$p->rec('1 foo bar'));
For now I am using the equivalent of:
rec: int elem(s?)
but that doesn't provide the check I want (i.e. the 4th and 5th tests fail) and is slower than I think it would be if I could specify the exact number of items to match.

Thanks, Les

Comment on Parse::RecDescent and Dynamically Matched Subrule Repetition
Select or Download Code
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by blokhead (Monsignor) on Jan 11, 2006 at 17:29 UTC
    This is the best I could come up with.. It's possible that this could be improved, as it's a little messy. Anyway, in P::RD you can pass arguments to a subrule using rule[args], and within the rule use the @arg or %arg variable to fetch those args.
    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 6; use Parse::RecDescent; my $p = Parse::RecDescent->new(<<'END_GRAMMAR') or die; rec: int elem[ num => $item[1] ] /\Z/ { $item[2] } int: /\d+\b/ ## match either the elem_many or the elem_one rule, and pass ## the number along to it.. elem: { $arg{num} > 1 ? "many" : "one" } <matchrule:elem_$item[1]>[ num => $arg{num} ] { $item[2] } elem_many: elem_one elem[ num => $arg{num}-1 ] { [ $item[1], @{$item[2]} ] } elem_one: /\S+(?!\S)/ { [$item[1]] } END_GRAMMAR # ok( $p->rec('0')); ok( $p->rec('1 foo')); ok( $p->rec('2 foo bar')); ok(!$p->rec('1')); ok(!$p->rec('1 foo bar')); ok( $p->rec('3 foo bar baz')); ok(!$p->rec('3 foo bar baz flab'));
    It works for everything except the zero case, which shouldn't be too bad to add as a degenerate case.

    Also note that I added a /\Z/ token to the main rule, so we would be assured that the rule was matching the whole string. And I also changed /\S+/ to /\S+(?!\S)/ to make sure each elem was a maximum-length word.

    BTW, you could do this simple problem with extended regexes: something like /(\d+)\s+(??{ "(?:\\S+(?>\\S)\\s*){$1}" })/ off the top of my head. But if elem_one were matching anything significantly more complicated /\S+/, you'd have to do something like this in a grammar.

    blokhead

      And I also changed /\S+/ to /\S+(?!\S)/ to make sure each elem was a maximum-length word.

      Useless. P::RD will not backtrack through /.../. They always match as much as possible.

      For example,

      use Data::Dumper qw( Dumper ); use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); parse : /a*/ /a/ /\Z/ { [ $item[1], $item[2] ] } __END_OF_GRAMMAR__ print(Dumper($p->parse('aaaaa')));

      outputs

      $VAR1 = undef;

      I almost said "The \b in /\d+\b/ is also useless.", but it turns "2foo bar" into an error.

Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by ikegami (Pope) on Jan 11, 2006 at 17:37 UTC

    The following works:

    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 5; use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] == 0 ? [] : undef } | ELEM rec_list[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__ ok($p->parse('0')); ok($p->parse('1 foo')); ok($p->parse('2 foo bar')); ok(!$p->parse('1')); ok(!$p->parse('1 foo bar'));

    Changes:

    • rec_list recursively builds a list from one element and a list. The terminating condition of the recursion is the count of items, which is passed as an argument to the rule.

    • I added a check for end of "file" (/\Z/). This catches "1 foo bar". It's always good to check if you have leftover text to parse (unless you want to allow leftover text).

    • I prefer <<'__END_OF_GRAMMAR__' over q{...} because it handles backslashes more intuitively.

    • Your use strict and use warnings are in a different scope than the grammar. For them to apply to the grammar, you need to include them in the grammar.

    • I renamed int to pos_int to make it clear that signs are not acceptable. A check on the numbers magnitude inside of rec wouldn't hurt.

    • I uppercased tokens. It's just a style I use.

    This is the same grammar as above, but with (yet-to-be-customized) error reporting:

    my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec eof { $item[1] } eof : /\Z/ | <error> rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : rec_list_[ $arg[0] ] | <error> rec_list_ : { $arg[0] == 0 ? [] : undef } | ELEM rec_list_[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by lhoward (Vicar) on Jan 11, 2006 at 20:40 UTC
    Thanks for the great suggestions. I had a feeling that some lisp-style recursive CDRing down the list might do the trick, but wasn't able to push it to a working solution. Some of my lists have more than 100 items, so I also had to sprinkle in "no warnings 'recursion';" to prevent the "Deep recursion on subroutine" warning.

    Les

      You can avoid the recursion by implementing your own looping:

      my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT { $thisparser->_parserepeat( $text, \&ELEM, $item[1], $item[1], # ELEM(#) $_noactions, $expectation, undef ) } { [ $item[0] => $item[2] ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__

      _parserepeat is the method called to handle rule(s), rule(s?), rule(4..6), etc.

      If you don't want to break the box, you could break down the problem instead:

      my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] < 1 ? [] : undef } | { $arg[0] < 10 ? 1 : undef } ELEM rec_list[ $arg[0]-1 ] { [ $item[2], @{$item[3]} ] } | { $arg[0] < 100 ? 1 : undef } ELEM(10) rec_list[ $arg[0]-10 ] { [ @{$item[2]}, @{$item[3]} ] } | { $arg[0] < 1000 ? 1 : undef } ELEM(100) rec_list[ $arg[0]-100 ] { [ @{$item[2]}, @{$item[3]} ] } | <error:Exceeded maximum list length of 999 elements> POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://522495]
Approved by blokhead
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2015-07-02 23:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (47 votes), past polls