Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Parse::RecDescent grammar that winds its way all the way back to start

by sundialsvc4 (Monsignor)
on Sep 23, 2010 at 15:11 UTC ( #861571=perlquestion: print w/ replies, xml ) Need Help??
sundialsvc4 has asked for the wisdom of the Perl Monks concerning the following question:

I am dealing with trying to write a Parse::RecDescent grammar for a file syntax that basically looks like this:

$JOBS <job_definition> ... (EOF)

Which I handle with:

startule: /\$JOBS\b/ix <commit> jobfile_joblist EOFILE jobfile_joblist: job_statement(s)

This start-rule has an alternative: <error: No \$JOBS statement

The definition for job_definition then looks like:  job_name <commit> job_options(?) , the trouble being that in this syntax there is no reserved-word that separates one job-definition from the next.

The problem that I seem to be having is this:   when there is any sort of syntax-error in any of the job-definitions, I see the parser merrily unwinding its way all the way back to start.   The failed parse always produces a “NO $JOBS STATEMENT” error, as the starting-rule fails.

So, for example, let’s say that I’ve got a job-file with 156 job definitions in it... and there’s a syntax-error on number 132.   Boing-g-g-g-g-g!!   The grammar winds its way all the way back to start, having backtracked all the way to the point where it imagines that perhaps this could be an altogether different kind of file.

What I want, I think, is for the parser, having recognized one occurrence of job_statement, to at that point <commit> to it.

Here's some relevant code ... EDIT: Greatly Reduced!

startrule__schedule_file: startrule_1_schedule_file startrule_1_schedule_file: /SCHEDULE\b/ix <commit> schedule_name <com +mit> schedule_optionlist ':' <commit> schedule_joblist EOFILE | <error: no SCHEDULE statement> schedule_optionlist: schedule_option(s?) schedule_joblist: job_statement(s) | <error: Schedule contains no jobs>

Comment on Parse::RecDescent grammar that winds its way all the way back to start
Select or Download Code
Re: Parse::RecDescent grammar that winds its way all the way back to start
by moritz (Cardinal) on Sep 23, 2010 at 15:36 UTC
    What I want, I think, is for the parser, having recognized one occurrence of job_statement, to at that point <commit> to it.

    So, <commit> after each successfully parsed job_statement. Then offer an alternative that slurps up the rest of the file, as a dummy, to make the grammar match in the end.

    I don't know Parse::RecDescent enough to express what I mean in code, so I'll try in Perl 6 rules, maybe that's sufficiently readable for you to translate it to your needs:

    grammar Jobs { token TOP { ^ '$JOBS' <job_def> $ } role job_def { [ <single_job_definiton> <commit> || .* # if there is no way to parse # a job definition, dummy-parse # the rest of the string to prevent # outright failure ]+ } }
    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Parse::RecDescent grammar that winds its way all the way back to start
by ikegami (Pope) on Sep 23, 2010 at 15:51 UTC
    • That's a crazy amount of code, way way more than required to demonstrate the problem. At most 20 lines are needed here, which is an exceptionally high amount.
    • None of the rules being discussed are even mentioned in the code! How is the code in the least bit relevant?
    • The code doesn't run as no data was provided.
    • Desired output is not provided.

    I know a lot about P::RD, but I'm not touching this mess.

Re: Parse::RecDescent grammar that winds its way all the way back to start
by JavaFan (Canon) on Sep 23, 2010 at 15:58 UTC
    The <commit> does not prevent the parser to backtrack through it - it's different from the Perl6 <commit> or Perl5 (*COMMIT). With Parse::RecDescent, a <commit> only means not to try any other alternative for the current rule. Which means for your job_statement production to not thrown an error if, after matching job_name, it's force to backtrack.

    You could make <commit> behave the way it does in Perl6 by adding, for each rule that has a <commit> in one of its alternatives, a final <uncommit><reject> alternative.

Re: Parse::RecDescent grammar that winds its way all the way back to start
by sundialsvc4 (Monsignor) on Sep 23, 2010 at 16:02 UTC

    I am looking at the “Eliminate Backtracking When Possible” section in Parse::RecDescent::FAQ::Original to see if it might have some bearing on my problem.

    It would be a purely-recursive rule something like this:

    job_statements : job_statement <commit> job_statements(?) | #NOTHING
    or:
    job_statements : job_statement <commit> job_statements | job_statement | #NOTHING

    But I really hate to recurse like that.   If I’ve got a list of two or three hundred job_statements, as I well might, then I’ve obliged the computer to recurse two or three hundred levels deep!   (I think ...)

    The trouble is, this file layout really is not well-thought-out.   (And it isn’t correctly documented, either.)   In other words, “it sucks large, yes ... but we are stuck with it.”

      But I really hate to recurse like that. If Iíve got a list of two or three hundred job_statements, as I well might, then Iíve obliged the computer to recurse two or three hundred levels deep!
      Well, you made the choice of using a recursive descent parser.

      They aren't known for making the fastests parses. Considering the number of <commit>s you have in your grammer, can't you use a limited look-ahead parser? (LL(N), LR(N), ...) No recursion nor backtracking is than needed; syntax errors are more quickly discovered, and you may have an easier time resyncing after a parsing error.

      But I really hate to recurse like that.

      job_statements : job_statement <commit> job_statements(?) | #NOTHING

      and the less efficient

      job_statements : job_statement <commit> job_statements | job_statement | #NOTHING

      are both just recursive versions of

      job_statements : job_statement(s?)

      If you don't want the recursion, use the last.

Re: Parse::RecDescent grammar that winds its way all the way back to start
by gnosti (Friar) on Sep 23, 2010 at 16:33 UTC
    You might design your parser to consume only one entry at a time. Pass your data to the parser as scalar reference. This way the scalar will be left with the remaining (unconsumed) text.
    $parser->start_rule(\$data) or warn("not a job file"), return; while( $data ){ $parser->job_statement(\$data) }
    Since you're calling the parser anew each time, you won't have a problem with unwinding entries.

    This doesn't address the question of how to resync if you encounter a bad entry.

      That is a very clever strategy.   I will ponder it carefully.

      In this case, “resyncing after a syntax error” is a non-issue.   I am interested in the error messages because I am debugging the grammar.

      Thanks for all your help, everyone.

Re: Parse::RecDescent grammar that winds its way all the way back to start
by sundialsvc4 (Monsignor) on Sep 23, 2010 at 19:57 UTC

    Okay, I think that I have come up with a workable solution.   (Given that the only parser-tool that I have to work with, in this case, is this one ...)

    Well, the first thing that I did, as you can see, is to shrink the size of the grammar-example in the original post to “just an example.”   (Sorry about that.)

    Then, I came up with a production like this:

    job_list : job_statement job_list | #NOTHING

    The rule does recurse.   When the “nothing” case is encountered (which simply means that “whatever we are looking at now, whatever it is, is not a job_statement”) the result of the production is a string consisting of the name of the nonterminal (“job_list”).   The action code (not shown) ignores that.

      Well, the first thing that I did, as you can see, is to shrink the size of the grammar-example in the original post to ďjust an example.Ē

      You still not clear what you want out of it.

      If you simply don't want it to fail, use a production that always matches instead of using <error>.

      Then, I came up with a production like this:

      job_list : job_statement job_list | #NOTHING

      will match the same as

      job_list : job_statement(s?)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://861571]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (17)
As of 2014-07-29 17:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (225 votes), past polls