|Pathologically Eclectic Rubbish Lister|
Is the skip: directive broken in Parse::RecDescent ? [Solved - PEBKAC]by Hercynium (Hermit)
|on Aug 11, 2008 at 21:21 UTC||Need Help??|
Hercynium has asked for the
wisdom of the Perl Monks concerning the following question:
So, I've been happily learning how to use grammars for parsing with Parse::RecDescent, and I've been very pleased with it's power and flexibility so far... but I'm stumbling over a problem that for the life of me, I can't understand why it's happening!
I highly doubt that this could be a bug in PRD - it's used by too many people... but even the most bare code is demonstrating this frustrating problem:
Basically, it's this: Changing the prefix pattern has NO effect!
If I print out $skip it shows that it is set as expected, but the behavior of PRD does not change from the default.
This happens whether I am using a skip: directive, setting $skip from within an Action, or setting $Parse::RecDescent::skip from outside the grammar code.
Here's a little demonstration of what I'm getting...
Code like this:
I'm pretty certain it's not a problem with the regexes I'm using because when I do something like this instead:
I get this output:
Update:As I suspected, the "skip" or "terminal prefix" functionality is *not* broken... but it is not quite as DWIMmy as I was expecting with regards to how the regular expression specified is used.
I still don't think I understand the subtle details, but as far as I can tell, one should keep in mind that the skip regex (aka terminal prefix), is matched ONLY ONCE. Therefore, one probably should surround the whole thing with a parenthesis and asterisk to ensure *everything* one wants to skip will be consumed in *one pass*
To further show what I mean, here is one of the many non-working regexes that brought me here:
/(?: \# .*? \n? | \s* )?/msx
It will match only ONE INSTANCE of a comment or repeated whitespace. My example text has several adjoining instances of comments and whitespace, and only the first match was being consumed!
Here is the regex that does what I want:
/(?: \# .*? \n | \s )*/msx
As you can see, it consumes ALL Comments AND whitespace until nothing matches. SMALL change, BIG difference!
I now have this working the way I want, by assigning it to $skip in the "start-up actions":
$skip = '(?msx: \# .*? \n | \s )*'
This has been another fun and edifying expedition, and if anyone reading this has any additional questions, I am happy to share whatever meager knowledge I have gained :)