Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Parse::RecDescent and Dynamically Matched Subrule Repetition

by lhoward (Vicar)
on Jan 11, 2006 at 16:54 UTC ( [id://522495]=perlquestion: print w/replies, xml ) Need Help??

lhoward has asked for the wisdom of the Perl Monks concerning the following question:

I am dealing with a data format that has a Pascal string representation like element to it. i.e. an integer followed a number of elements indicated by the leading integer. I'd like to encapsulate this within a Parse::RecDescent grammar using some of the magic from dynamically matched rules together with subrule repetition, but can't get it to work. Any recommendations? Code is below.
#!/usr/bin/perl use strict; use warnings; use Test::More tests => 5; use Parse::RecDescent; my $p=Parse::RecDescent->new(q{ rec: int elem("$item[1]") int: /\d+/ elem: /\S+/ }); ok($p->rec('0')); ok($p->rec('1 foo')); ok($p->rec('2 foo bar')); ok(!$p->rec('1')); ok(!$p->rec('1 foo bar'));
For now I am using the equivalent of:
rec: int elem(s?)
but that doesn't provide the check I want (i.e. the 4th and 5th tests fail) and is slower than I think it would be if I could specify the exact number of items to match.

Thanks, Les

Replies are listed 'Best First'.
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by blokhead (Monsignor) on Jan 11, 2006 at 17:29 UTC
    This is the best I could come up with.. It's possible that this could be improved, as it's a little messy. Anyway, in P::RD you can pass arguments to a subrule using rule[args], and within the rule use the @arg or %arg variable to fetch those args.
    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 6; use Parse::RecDescent; my $p = Parse::RecDescent->new(<<'END_GRAMMAR') or die; rec: int elem[ num => $item[1] ] /\Z/ { $item[2] } int: /\d+\b/ ## match either the elem_many or the elem_one rule, and pass ## the number along to it.. elem: { $arg{num} > 1 ? "many" : "one" } <matchrule:elem_$item[1]>[ num => $arg{num} ] { $item[2] } elem_many: elem_one elem[ num => $arg{num}-1 ] { [ $item[1], @{$item[2]} ] } elem_one: /\S+(?!\S)/ { [$item[1]] } END_GRAMMAR # ok( $p->rec('0')); ok( $p->rec('1 foo')); ok( $p->rec('2 foo bar')); ok(!$p->rec('1')); ok(!$p->rec('1 foo bar')); ok( $p->rec('3 foo bar baz')); ok(!$p->rec('3 foo bar baz flab'));
    It works for everything except the zero case, which shouldn't be too bad to add as a degenerate case.

    Also note that I added a /\Z/ token to the main rule, so we would be assured that the rule was matching the whole string. And I also changed /\S+/ to /\S+(?!\S)/ to make sure each elem was a maximum-length word.

    BTW, you could do this simple problem with extended regexes: something like /(\d+)\s+(??{ "(?:\\S+(?>\\S)\\s*){$1}" })/ off the top of my head. But if elem_one were matching anything significantly more complicated /\S+/, you'd have to do something like this in a grammar.

    blokhead

      And I also changed /\S+/ to /\S+(?!\S)/ to make sure each elem was a maximum-length word.

      Useless. P::RD will not backtrack through /.../. They always match as much as possible.

      For example,

      use Data::Dumper qw( Dumper ); use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); parse : /a*/ /a/ /\Z/ { [ $item[1], $item[2] ] } __END_OF_GRAMMAR__ print(Dumper($p->parse('aaaaa')));

      outputs

      $VAR1 = undef;

      I almost said "The \b in /\d+\b/ is also useless.", but it turns "2foo bar" into an error.

Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by ikegami (Patriarch) on Jan 11, 2006 at 17:37 UTC

    The following works:

    #!/usr/bin/perl use strict; use warnings; use Test::More tests => 5; use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] == 0 ? [] : undef } | ELEM rec_list[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__ ok($p->parse('0')); ok($p->parse('1 foo')); ok($p->parse('2 foo bar')); ok(!$p->parse('1')); ok(!$p->parse('1 foo bar'));

    Changes:

    • rec_list recursively builds a list from one element and a list. The terminating condition of the recursion is the count of items, which is passed as an argument to the rule.

    • I added a check for end of "file" (/\Z/). This catches "1 foo bar". It's always good to check if you have leftover text to parse (unless you want to allow leftover text).

    • I prefer <<'__END_OF_GRAMMAR__' over q{...} because it handles backslashes more intuitively.

    • Your use strict and use warnings are in a different scope than the grammar. For them to apply to the grammar, you need to include them in the grammar.

    • I renamed int to pos_int to make it clear that signs are not acceptable. A check on the numbers magnitude inside of rec wouldn't hurt.

    • I uppercased tokens. It's just a style I use.

    This is the same grammar as above, but with (yet-to-be-customized) error reporting:

    my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec eof { $item[1] } eof : /\Z/ | <error> rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : rec_list_[ $arg[0] ] | <error> rec_list_ : { $arg[0] == 0 ? [] : undef } | ELEM rec_list_[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition
by lhoward (Vicar) on Jan 11, 2006 at 20:40 UTC
    Thanks for the great suggestions. I had a feeling that some lisp-style recursive CDRing down the list might do the trick, but wasn't able to push it to a working solution. Some of my lists have more than 100 items, so I also had to sprinkle in "no warnings 'recursion';" to prevent the "Deep recursion on subroutine" warning.

    Les

      You can avoid the recursion by implementing your own looping:

      my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT { $thisparser->_parserepeat( $text, \&ELEM, $item[1], $item[1], # ELEM(#) $_noactions, $expectation, undef ) } { [ $item[0] => $item[2] ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__

      _parserepeat is the method called to handle rule(s), rule(s?), rule(4..6), etc.

      If you don't want to break the box, you could break down the problem instead:

      my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] < 1 ? [] : undef } | { $arg[0] < 10 ? 1 : undef } ELEM rec_list[ $arg[0]-1 ] { [ $item[2], @{$item[3]} ] } | { $arg[0] < 100 ? 1 : undef } ELEM(10) rec_list[ $arg[0]-10 ] { [ @{$item[2]}, @{$item[3]} ] } | { $arg[0] < 1000 ? 1 : undef } ELEM(100) rec_list[ $arg[0]-100 ] { [ @{$item[2]}, @{$item[3]} ] } | <error:Exceeded maximum list length of 999 elements> POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://522495]
Approved by blokhead
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-20 01:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found