kudra has asked for the wisdom of the Perl Monks concerning the following question:

It's been a few years since I used Parse::RecDescent, so it might be that my expectations are wrong, but I was under the impression that once some text had matched a rule, that same text would not be used to match a subsequent rule.

Excluding the actions that I take (printing the result) and the subrules (which are all regexes), the relationship between the rules is this:

start : (index pre risk) | (pre risk) risk : (risk1 | risk2) index : (pre1 | pre2 | pre3) pre : (pre1 | pre2 | pre3)
Most of my test cases match the 'index pre risk' pattern, often using the same pre# rule for both the index and pre matches. However, in 8 out of my 100 cases, the index and pre match are matching exactly the same line (as seen in both the output and $thisline).

Perhaps someone more familiar with P::RD could tell me if this is something they've seen before, and how I might prevent it.

Update: I do have a minimal test case, but because the sample data file is so large I don't want to attach it to this post.

Replies are listed 'Best First'.
Re: Parse::RecDescent matching same line twice
by ikegami (Patriarch) on Mar 01, 2009 at 18:12 UTC

    When the score is settled, each character of the text can only be matched by two rules if one rule is a production of another.

    text: struct { int foo ; int bar ; } ------ - --- --- - --- --- - - IDENT "{" IDENT IDENT ";" IDENT IDENT ";" "}" --- --- --- --- type var type var --------- --------- decl decl ------------------------- decl_list --------------------------------------------- struct --------------------------------------------- parse

    But in reaching that state, a rule can match, then be unmatched by a backtrack. For example, given the grammar

    parse : foo1 foo2 | bar1 bar2

    foo1 could matched, but PRD will backtrack if it can't follow with a foo2 match. It will then try bar1.

    I'm guessing one of your productions has side-effects, so you falsely believed it has matched even though a backtrack unmatched it. I could very well be wrong because I have very little data to go on.

    Update: In the following example, you'll see foo1 on the screen even though it wasn't matched.

    use strict; use warnings; use Parse::RecDescent qw( ); my $grammar = <<'__EOI__'; { use strict; use warnings; } parse : foo1 foo2 /\Z/ { [ @item[0,1,2] ] } | bar1 bar2 /\Z/ { [ @item[0,1,2] ] } foo1 : "X" { print("$item[0]\n"); [ @item[0,1] ] } foo2 : "Y" { print("$item[0]\n"); [ @item[0,1] ] } bar1 : "X" { print("$item[0]\n"); [ @item[0,1] ] } bar2 : "Z" { print("$item[0]\n"); [ @item[0,1] ] } __EOI__ Parse::RecDescent->Precompile($grammar, 'Grammar') or die("Bad grammar\n");
    use strict; use warnings; use Data::Dumper qw( Dumper ); use Grammar qw( ); my $parser = Grammar->new(); my $matches = $parser->parse('XZ') or die("Bad input\n"); print("\n"); print(Dumper($matches));
    foo1 bar1 bar2 $VAR1 = [ 'parse', [ 'bar1', 'X' ], [ 'bar2', 'Z' ] ];
      Thank you for your answer; that appears to be what is happening. I figured it was due to my lack of knowledge about P::RD. I was puzzled by the way it appeared some of my test cases were rejecting the first rule without consequence.
        It's just like in regexps
        'XZ' =~ / ^ (?: X (?{ print "X" }) Y (?{ print "Y" }) | X (?{ print "X" }) Z (?{ print "Z" }) ) \z /x; print("\n");