Parse::RecDescent and Dynamically Matched Subrule Repetition

lhoward has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition by blokhead (Monsignor) on Jan 11, 2006 at 17:29 UTC
This is the best I could come up with.. It's possible that this could be improved, as it's a little messy. Anyway, in P::RD you can pass arguments to a subrule using `rule[args]`, and within the rule use the @arg or %arg variable to fetch those args. #!/usr/bin/perl use strict; use warnings; use Test::More tests => 6; use Parse::RecDescent; my $p = Parse::RecDescent->new(<<'END_GRAMMAR') or die; rec: int elem[ num => $item[1] ] /\Z/ { $item[2] } int: /\d+\b/ ## match either the elem_many or the elem_one rule, and pass ## the number along to it.. elem: { $arg{num} > 1 ? "many" : "one" } <matchrule:elem_$item[1]>[ num => $arg{num} ] { $item[2] } elem_many: elem_one elem[ num => $arg{num}-1 ] { [ $item[1], @{$item[2]} ] } elem_one: /\S+(?!\S)/ { [$item[1]] } END_GRAMMAR # ok( $p->rec('0')); ok( $p->rec('1 foo')); ok( $p->rec('2 foo bar')); ok(!$p->rec('1')); ok(!$p->rec('1 foo bar')); ok( $p->rec('3 foo bar baz')); ok(!$p->rec('3 foo bar baz flab')); [download] It works for everything except the zero case, which shouldn't be too bad to add as a degenerate case. Also note that I added a /\Z/ token to the main rule, so we would be assured that the rule was matching the whole string. And I also changed /\S+/ to /\S+(?!\S)/ to make sure each `elem` was a maximum-length word. BTW, you could do this simple problem with extended regexes: something like `/(\d+)\s+(??{ "(?:\\S+(?>\\S)\\s*){$1}" })/` off the top of my head. But if `elem_one` were matching anything significantly more complicated /\S+/, you'd have to do something like this in a grammar. blokhead	[reply] [d/l] [select]
Re^2: Parse::RecDescent and Dynamically Matched Subrule Repetition by ikegami (Patriarch) on Jan 11, 2006 at 17:42 UTC
And I also changed `/\S+/` to `/\S+(?!\S)/` to make sure each elem was a maximum-length word. Useless. P::RD will not backtrack through `/.../`. They always match as much as possible. For example, `use Data::Dumper qw( Dumper ); use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); parse : /a*/ /a/ /\Z/ { [ $item[1], $item[2] ] } __END_OF_GRAMMAR__ print(Dumper($p->parse('aaaaa')));` [download] outputs `$VAR1 = undef;` [download] I almost said "The `\b` in `/\d+\b/` is also useless.", but it turns `"2foo bar"` into an error.	[reply] [d/l] [select]
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition by ikegami (Patriarch) on Jan 11, 2006 at 17:37 UTC
The following works: #!/usr/bin/perl use strict; use warnings; use Test::More tests => 5; use Parse::RecDescent (); my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] == 0 ? [] : undef } \| ELEM rec_list[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__ ok($p->parse('0')); ok($p->parse('1 foo')); ok($p->parse('2 foo bar')); ok(!$p->parse('1')); ok(!$p->parse('1 foo bar')); [download] Changes: `rec_list` recursively builds a list from one element and a list. The terminating condition of the recursion is the count of items, which is passed as an argument to the rule. I added a check for end of "file" (`/\Z/`). This catches `"1 foo bar"`. It's always good to check if you have leftover text to parse (unless you want to allow leftover text). I prefer `<<'__END_OF_GRAMMAR__'` over `q{...}` because it handles backslashes more intuitively. Your `use strict` and `use warnings` are in a different scope than the grammar. For them to apply to the grammar, you need to include them in the grammar. I renamed `int` to `pos_int` to make it clear that signs are not acceptable. A check on the numbers magnitude inside of `rec` wouldn't hurt. I uppercased tokens. It's just a style I use. This is the same grammar as above, but with (yet-to-be-customized) error reporting: `my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec eof { $item[1] } eof : /\Z/ \| <error> rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : rec_list_[ $arg[0] ] \| <error> rec_list_ : { $arg[0] == 0 ? [] : undef } \| ELEM rec_list_[ $arg[0]-1 ] { [ $item[1], @{$item[2]} ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__` [download]	[reply] [d/l] [select]
Re: Parse::RecDescent and Dynamically Matched Subrule Repetition by lhoward (Vicar) on Jan 11, 2006 at 20:40 UTC
Thanks for the great suggestions. I had a feeling that some lisp-style recursive CDRing down the list might do the trick, but wasn't able to push it to a working solution. Some of my lists have more than 100 items, so I also had to sprinkle in "no warnings 'recursion';" to prevent the "Deep recursion on subroutine" warning. Les	[reply]
Re^2: Parse::RecDescent and Dynamically Matched Subrule Repetition by ikegami (Patriarch) on Jan 12, 2006 at 18:51 UTC
You can avoid the recursion by implementing your own looping: `my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT { $thisparser->_parserepeat( $text, \&ELEM, $item[1], $item[1], # ELEM(#) $_noactions, $expectation, undef ) } { [ $item[0] => $item[2] ] } POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__` [download] `_parserepeat` is the method called to handle `rule(s)`, `rule(s?)`, `rule(4..6)`, etc. If you don't want to break the box, you could break down the problem instead: my $p = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { use strict; use warnings; } parse : rec /\Z/ { $item[1] } rec : POS_INT rec_list[ $item[1] ] { [ $item[0] => $item[2] ] } rec_list : { $arg[0] < 1 ? [] : undef } \| { $arg[0] < 10 ? 1 : undef } ELEM rec_list[ $arg[0]-1 ] { [ $item[2], @{$item[3]} ] } \| { $arg[0] < 100 ? 1 : undef } ELEM(10) rec_list[ $arg[0]-10 ] { [ @{$item[2]}, @{$item[3]} ] } \| { $arg[0] < 1000 ? 1 : undef } ELEM(100) rec_list[ $arg[0]-100 ] { [ @{$item[2]}, @{$item[3]} ] } \| <error:Exceeded maximum list length of 999 elements> POS_INT : /\d+/ ELEM : /\S+/ __END_OF_GRAMMAR__ [download]	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks