Parse::RecDescent: how does <matchrule:> work?

7stud has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

Suppose I want to match text like this:

     { hello }
     {{ hello }}
     {{{ hello }}}

I started simply by writing a grammar to match the two brace string:

use strict; 
use warnings; 
use 5.012;

use Parse::RecDescent;

$::RD_ERRORS = 1; #Parser dies when it encounters an error
$::RD_WARN   = 1; #Enable warnings - warn on unused rules &c.
$::RD_HINT   = 1; #Give out hints to help fix problems.
#$::RD_TRACE  = 1; #Trace parsers' behaviour

my $text = <<'END_OF_TEXT';
{{ hello }}
END_OF_TEXT

my $grammar = <<'END_OF_GRAMMAR';
    {
        use 5.012;
        use Data::Dumper;
    }

    startrule: lbrace(2) 
               'hello' 
               rbrace(2) 

    lbrace: / [{] /xms
    rbrace: / [}] /xms
                  
END_OF_GRAMMAR

my $parser = Parse::RecDescent->new($grammar) 
    or die "Bad grammar!\n";

defined $parser->startrule($text) 
    or die "Can't match text";
[download]

Then I tried to adjust the grammar to make the rbrace subrule dynamic. My idea was to use <rulevar:> to declare a local variable called $lbrace_count:

    startrule: <rulevar: $lbrace_count>
[download]

Then use an action to assign $lbrace_count the number of lbraces that matched:

    startrule: lbrace(2)  { $lbrace = length $item[1] }
               'hello' 
               rbrace(2)
[download]

The grammar above still matches.

Next I used <matchrule:> to create the string 'rbrace(2)'. According to the docs, you are supposed to put an unquoted series of characters after the colon in <matchrule:> and then P::RD will put the series of characters inside a qq{} to produce the name of a subrule:

    startrule: <rulevar: $lbrace_count>

    startrule: lbrace(2)  { $lbrace_count = length $item[1] }
               'hello' 
               <matchrule: rbrace($lbrace_count)>

    lbrace: / [{] /xms
    rbrace: / [}] /xms
[download]

But that produces a weird error that I can't sort out:

Unknown starting rule (Parse::RecDescent::namespace000001::rbrace(18))
+ called
 at prd1.pl line 39.
[download]

In fact, I can't even get the following simplification to work:

    startrule: lbrace(2)  
               'hello' 
               <matchrule: rbrace(2)>

    lbrace: / [{] /xms
    rbrace: / [}] /xms


--output:--
Unknown starting rule (Parse::RecDescent::namespace000001::rbrace(2)) 
+called
 at prd1.pl line 39.
[download]

Here's the complete program:

use strict; 
use warnings; 
use 5.012;

use Parse::RecDescent;

$::RD_ERRORS = 1; #Parser dies when it encounters an error
$::RD_WARN   = 1; #Enable warnings - warn on unused rules &c.
$::RD_HINT   = 1; #Give out hints to help fix problems.
$::RD_TRACE  = 1; #Trace parsers' behaviour

my $text = <<'END_OF_TEXT';
{{ hello }}
END_OF_TEXT

my $grammar = <<'END_OF_GRAMMAR';
    {
        use 5.012;
        use Data::Dumper;
    }

    startrule: <rulevar: $lbrace_count>
    startrule: lbrace(2)  { $lbrace_count = length $item[1] }
               'hello' 
               <matchrule: rbrace($lbrace_count)>

    lbrace: / [{] /xms
    rbrace: / [}] /xms

                  
END_OF_GRAMMAR



my $parser = Parse::RecDescent->new($grammar) 
    or die "Bad grammar!\n";

#ERRROR ON NEXT LINE *****************************

defined $parser->startrule($text) 
    or die "Can't match text";
[download]

Comment on Parse::RecDescent: how does <matchrule:> work? Select or Download Code

Replies are listed 'Best First'.
Re: Parse::RecDescent: how does <matchrule:> work? by 7stud (Deacon) on Feb 07, 2013 at 07:17 UTC
Okay, I understand now: there is no rule named 'rbrace(2)' anywhere in my grammar. The rule name is 'rbrace'. The correct syntax is: `<matchrule: rbrace>(2)` [download] But this doesn't work: `<matchrule: rbrace>($item[1])` [download] Any ideas how I can accomplish what I want?	[reply] [d/l] [select]
Re: Parse::RecDescent: how does <matchrule:> work? by sundialsvc4 (Abbot) on Feb 07, 2013 at 14:28 UTC
Question #1 is ... what do you want? (No, seriously.) What interpretation of these three strings do you want to be “correct?” Are they three distinct cases, or examples of the same one? The interpretation that I guess you probably want is that there are only three tokens of interest: `'{', '}', <ident>`. The grammar, in quasi-BNF syntax, would then become: `<statement> ::= <ident> \| '{' <statement>+ '}' ; <ident> ::= /[A-Za-z0-9]/ ;` [download] This would be a simple left-tail recursive case. But I am sure that my syntax is not actually sufficient for your needs, because if it were, you would scarcely need a parser with which to solve it. Please provide a full representative example of the type of data you need to parse, and we’ll help you write the proper grammar-mojo. It does take practice.	[reply] [d/l]
Re^2: Parse::RecDescent: how does <matchrule:> work? by 7stud (Deacon) on Feb 08, 2013 at 03:41 UTC
Please provide a full representative example of the type of data you need to parse, and we’ll help you write the proper grammar-mojo. It does take practice. The sample text and my finished grammar are over here.	[reply]
Re: Parse::RecDescent: how does <matchrule:> work? by 7stud (Deacon) on Feb 07, 2013 at 16:58 UTC
Question #1 is ... what do you want? (No, seriously.) What interpretation of these three strings do you want to be “correct?” Are they three distinct cases, or examples of the same one? Sorry, I know the frustration. I see now that what I said has multiple interpretations, and that I didn't even follow what I tell others to do when posting questions like this. To clarify: I want to be able to apply my grammar to take this string: { hello } ...and find a match.. Then I want to be able to apply the same grammar to this string: {{ hello }} ...and also find a match. Then I want to be able to apply the same grammar to this string: {{{ hello }}} and also find a match. By "match", I mean that my program should produce no errors, and these lines: `my $parser = Parse::RecDescent->new($grammar) or die "Bad grammar!\n"; defined $parser->startrule($text) or die "Can't match text";` [download] ...should not produce any output. So when I run my program three times on each of those strings, each time there should be no output at all. What I am trying to code is a backreference. But now that I think about it, applying my desired grammar to this three line block of text: { hello } {{ hello }} {{{ hello }}} ...would only require a trivial adjustment to the grammar: `startrule: brace_block(s) #adjustment brace_block: lbrace(1..) 'hello' <something> lbrace: / [{] /xms rbrace: / [}] /xms` [download] In any case, I figured out a solution. The grammar below allows me to parse 'hello', preceded by a variable number of braces, followed by the same number of braces that preceded 'hello': `startrule: <rulevar: $rbraces> startrule: lbrace(s) { my $lbraces = join '', @{$item[1]}; $rbraces = "}" x length $lbraces; } 'hello' "$rbraces" lbrace: / [{] /xms` [download] Instead of using a rule for the right braces, I am using a literal. Is there a better way to produce a backreference? Please provide a full representative example of the type of data you need to parse, and we’ll help you write the proper grammar-mojo. It does take practice. Yes, this is part of a larger grammar I'm working on, but I don't think posting the whole thing would be helpful. I would love to have you take a look at the complete grammar when I'm done. My current problem: when I incorporate the solution above into one of my rules, I can't get the blasted rule to return the entire match to other rules without causing the error: `Can't use string ("{{") as an ARRAY ref while "strict refs" in use at +(eval 14) line 377` [download] I find it frustrating that Damian didn't use Carp, so that I could pin down exactly which line in my code is causing that error--I could care less which line in his code is causing the error. Another thing I find frustrating is that a rule doesn't return its whole match to another rule. Why in the heck is the default to return only what matched the last term of a rule? I have no idea how the default behavior would ever be useful.	[reply] [d/l] [select]
Re: Parse::RecDescent: how does <matchrule:> work? by 7stud (Deacon) on Feb 07, 2013 at 19:07 UTC
Yes, this is part of a larger grammar I'm working on, but I don't think posting the whole thing would be helpful. I would love to have you take a look at the complete grammar when I'm done. My current problem: when I incorporate the solution above into one of my rules, I can't get the blasted rule to return the entire match to other rules without causing the error: Jiminy Christmas. It looks like an action appearing in the middle of a rule inserts the return value of the action as an additional entry in @item! The inserted value immediately follows the current subrule's match: use strict; use warnings; use 5.012; use Parse::RecDescent; $::RD_ERRORS = 1; #Parser dies when it encounters an error $::RD_WARN = 1; #Enable warnings - warn on unused rules &c. $::RD_HINT = 1; #Give out hints to help fix problems. #$::RD_TRACE = 1; #Trace parsers' behaviour my $text = <<'END_OF_TEXT'; { hello } END_OF_TEXT my $grammar = <<'END_OF_GRAMMAR'; { use 5.012; use Data::Dumper; } startrule: brace_block(s) brace_block: <rulevar: ($lbraces, $rbraces)> brace_block: lbrace(1..) { $lbraces = join '', @{$item[1]}; $rbraces = '}' x length $lbraces; } 'hello' "$rbraces" { say Dumper(\@item); #$return = "$lbraces $item[2] $rbraces"; } lbrace: / [{] /xms --output:-- $VAR1 = [ 'brace_block', [ '{' ], '}', 'hello', '}' ]; ... ... ... [download] The output shows that in the brace_block rule, the match for the subrule lbrace(1..), i.e. $item`[1]`, is an array containing the matching braces. That is as expected. However, immediately following that match is the closing brace returned by the action that follows the lbrace(1..) rule. As a result, the matches for the other subrules are at indexes one higher than where they normally would be at. So the last line in the last action, which attempts to return all the matching text for the brace_clause rule, i.e. `$return = "$lbraces $item[2] $rbraces";` [download] needs to be changed to: `$return = "$lbraces $item[3] $rbraces";` [download] The same grammar can be used for parsing the three line block of text: { hello } {{ hello }} {{{ hello }}} use strict; use warnings; use 5.012; use Parse::RecDescent; $::RD_ERRORS = 1; #Parser dies when it encounters an error $::RD_WARN = 1; #Enable warnings - warn on unused rules &c. $::RD_HINT = 1; #Give out hints to help fix problems. #$::RD_TRACE = 1; #Trace parsers' behaviour my $text = <<'END_OF_TEXT'; { hello } {{ hello }} {{{ hello }}} END_OF_TEXT my $grammar = <<'END_OF_GRAMMAR'; { use 5.012; use Data::Dumper; } startrule: brace_block(s) { say Dumper(\@item); } brace_block: <rulevar: ($lbraces, $rbraces)> brace_block: lbrace(1..) { $lbraces = join '', @{$item[1]}; $rbraces = '}' x length $lbraces; } 'hello' "$rbraces" { $return = "$lbraces $item[3] $rbraces"; } lbrace: / [{] /xms END_OF_GRAMMAR my $parser = Parse::RecDescent->new($grammar) or die "Bad grammar!\n"; defined $parser->startrule($text) or die "Can't match text"; --output:-- $VAR1 = [ 'startrule', [ '{ hello }', '{{ hello }}', '{{{ hello }}}' ] ]; [download]	[reply] [d/l] [select]


"be consistent"
	PerlMonks