http://www.perlmonks.org?node_id=228857

premchai21 has asked for the wisdom of the Perl Monks concerning the following question:

I am attempting to parse some input using Parse::RecDescent version 1.80; however, it does not appear to be parsing my grammar correctly, thinking an implicit subrule spans much more than it does. Following the trace after that confuses me; I am not sure what is going on. Full grammar follows, with comments added to indicate how PRD appears to be parsing my grammar:

# buggo, where's the case-sensitivity go, here or elsewhere? # the first subrule here... command: ( oops /\s+/ word { "(oops '$item[2])" } | ap { "(order me $item[1])" } | np ', ' ap { "(order $item[1] $item[3])" } ) ...!/\S/ { $item[1] } ap: <leftop: vp then vp> { "(actions @{$item[1]})" } # buggo, omit thens in return vp: again /\s+/ { "'again" } | verb /\s+/ <matchrule:v_$item[1]> /\s+/ { $item[2] } np: <leftop: basic_np np_con basic_np> { "(phrases @{$item[1]})" } basic_np: ( <leftop: descriptor /\s+/ descriptor> )(?) <leftop: noun /\s+/ noun> /\s+/ { "(phrases @{$item[1]} @{$item[2]})" } descriptor: article | all | other | number | possessive noun: me | pronoun | word verb: word { $::Verbs{$item[1]} } np_con: <reject> # todo # need to add to this prep: /o(:?n|ver|ff|ut of)|with(?:out)?|under|in(?:to)?| at|to|sans|from|toward/xi # buggo, doesn't parse word numbers as well as it should # should use module for that number_part: /([0-9]+)/ { $item[1] } | /one|t(?:w(?:o|e(?:lve|enty))|h(?:ree|ousand)|en)|f(?:our +|ive)| s(?:ix|even)|e(?:ight|leven)|nine|zero| (?:thir|f(?:ou?r|if)|s(?:ix|even)|eigh|nine)t(?:een|y)| hundred|[mb]illion/ix { $::Numbers{$item[1]} } number: number_part /\s+/ | number_part /\s+/ number { [$item[1], $item[1] + $item[2], $item[1] * $item[2]] ->[$item[1] <=> $item[2]] } me: 'me' | 'myself' | 'i' | 'self' pronoun: 'it' | 'him' | 'you' | 'her' | 'them' possessive: 'my' | 'his' | 'her' | 'your' article: 'the' | 'a' | 'an' all: 'all' | 'every' other: 'other' oops: 'oops' | 'o' then: ',' 'then' | 'then' | '. ' again: 'again' | 'g' # would appear to be thought to end at the right-paren here word: /([a-zA-Z\-]+)/ v_go: direction { "(go '$item[1])" } direction: /((?:n(?:orth)?|s(?:outh)?)?(?:e(?:ast)?|w(?:est)?)?|(?:up| +down))/i

RD_TRACE trace follows:

Parse::RecDescent: Treating "# buggo, where's the case-sensitivity + go, here or elsewhere?" as a comment Parse::RecDescent: Treating "command:" as a rule declaration Parse::RecDescent: Treating "( oops /\s+/ word { "(oops '$item[2]) +" } | ap { "(order me $item[1])" } | np ', ' ap { "(orde +r $item[1] $item[3])" } ) ...!/\S/ { $item[1] } a +p: <leftop: vp then vp> { "(actions @{$item[1]})" +} # buggo, omit thens in return vp: again /\s+/ { " +'again" } | verb /\s+/ <matchrule:v_$item[1]> /\s+/ { $ +item[2] } np: <leftop: basic_np np_con basic_np> { "(ph +rases @{$item[1]})" } basic_np: ( <leftop: descriptor + /\s+/ descriptor> )(?) <leftop: noun /\s+/ noun> /\s+ +/ { "(phrases @{$item[1]} @{$item[2]})" } descripto +r: article | all | other | number | possessive nou +n: me | pronoun | word verb: word { $::Verbs{$item[1]} +} np_con: <reject> # todo # need to add to this p +rep: /o(:?n|ver|ff|ut of)|with(?:out)?|under|in(?:to +)?| at|to|sans|from|toward/xi # buggo, doesn't pars +e word numbers as well as it should # should use modul +e for that number_part: /([0-9]+)/ { $item[1] } | /one|t(?:w(?:o|e(?:lve|enty))|h(?:ree|ousand)|e +n)|f(?:o ur|ive)| s(?:ix|even)|e(?:ight|leven)|nine|zero +| (?:thir|f(?:ou?r|if)|s(?:ix|even)|eigh|nine)t(? +:een|y)| hundred|[mb]illion/ix { $::Numbers{$item[1]} } +number: number_part /\s+/ | number_part /\s+/ number { [$item[1], $item[1] + $item[2], $item[1] * $ite +m[2]] - >[$item[1] <=> $item[2]] } me: 'me' | 'myself' +| 'i' | 'self' pronoun: 'it' | 'him' | 'you' | 'her' | +'them' possessive: 'my' | 'his' | 'her' | 'your' artic +le: 'the' | 'a' | 'an' all: 'all' | 'every' other: +'other' oops: 'oops' | 'o' then: ',' 'then' | 'then' | +'. ' again: 'again' | 'g' word: /([a-zA-Z\-]+ )" as +an implicit subrule Parse::RecDescent: Treating "_alternation_1_of_production_1_of_rule_command + :" as a rule declaration Parse::RecDescent: Treating "oops" as a subrule match Parse::RecDescent: Treating "/\s+/" as a /../ pattern terminal Parse::RecDescent: Treating "word" as a subrule match Parse::RecDescent: Treating "{ "(oops '$item[2])" }" as an action Parse::RecDescent: Treating "|" as a new production Parse::RecDescent: Treating "ap" as a subrule match Parse::RecDescent: Treating "{ "(order me $item[1])" }" as an acti +on Parse::RecDescent: Treating "|" as a new production Parse::RecDescent: Treating "np" as a subrule match Parse::RecDescent: Treating ", " as a literal terminal Parse::RecDescent: Treating "ap" as a subrule match Parse::RecDescent: Treating "{ "(order $item[1] $item[3])" }" as a +n action ERROR (line 8): Untranslatable item encountered: ")" Parse::RecDescent: Treating "...!" as a negative lookahead Parse::RecDescent: Treating "/\S/" as a /../ pattern terminal Parse::RecDescent: Treating "{ $item[1] }" as an action Parse::RecDescent: Treating "ap:" as a rule declaration Parse::RecDescent: Treating "<leftop:...>" as a left-associative o +perator directive Parse::RecDescent: Treating "vp" as a subrule match Parse::RecDescent: Treating "then" as a subrule match Parse::RecDescent: Treating "vp" as a subrule match Parse::RecDescent: Treating "{ "(actions @{$item[1]})" }" as an ac +tion Parse::RecDescent: Treating "# buggo, omit thens in return" as a c +omment Parse::RecDescent: Treating "vp:" as a rule declaration Parse::RecDescent: Treating "again" as a subrule match Parse::RecDescent: Treating "/\s+/" as a /../ pattern terminal Parse::RecDescent: Treating "{ "'again" }" as an action Parse::RecDescent: Treating "|" as a new production Parse::RecDescent: Treating "verb" as a subrule match Parse::RecDescent: Treating "/\s+/" as a /../ pattern terminal Parse::RecDescent: Treating "<matchrule:v_$item[1]>" as a subrule +match Parse::RecDescent: Treating "/\s+/" as a /../ pattern terminal Parse::RecDescent: Treating "{ $item[2] }" as an action Parse::RecDescent: Treating "np:" as a rule declaration Parse::RecDescent: Treating "<leftop:...>" as a left-associative o +perator directive Parse::RecDescent: Treating "basic_np" as a subrule match Parse::RecDescent: Treating "np_con" as a subrule match Parse::RecDescent: Treating "basic_np" as a subrule match Parse::RecDescent: Treating "{ "(phrases @{$item[1]})" }" as an ac +tion Parse::RecDescent: Treating "basic_np:" as a rule declaration Parse::RecDescent: Treating "_alternation_1_of_production_1_of_rule_command +" as a subrule match Parse::RecDescent: Treating "/ v_go: direction { "(go '$item[1])" +} direction: /" as a /../ pattern terminal Parse::RecDescent: Treating "( (?:n(?:orth)?|s(?:outh )" as an imp +licit subrule Parse::RecDescent: Treating "_alternation_2_of_production_1_of_rule_command + :" as a rule declaration Parse::RecDescent: Treating "( ?:n(?:orth )" as an implicit subrul +e Parse::RecDescent: Treating "_alternation_1_of_production_1_of_rule__altern +ation_2_ of_production_1_of_rule_command :" as a rule declaration ERROR (line 49): Untranslatable item encountered: "?:n(?:orth" Parse::RecDescent: Treating "_alternation_1_of_production_1_of_rule__altern +ation_2_ of_production_1_of_rule_command" as a subrule m +atch ERROR (line 49): Untranslatable item encountered: "?|s(?:outh" Parse::RecDescent: Treating "_alternation_2_of_production_1_of_rule_command +" as a subrule match ERROR (line 48): Untranslatable item encountered: "?)?(?:e(?:ast)?|w(?:est)?)?|(?:up|down))/i"

Thank you for your patience and attention.

update (broquaint): added <readmore>

Replies are listed 'Best First'.
Re: Parse::RecDescent eats large part of grammar, thinking it to be implicit subrule
by castaway (Parson) on Jan 22, 2003 at 10:00 UTC
    I tink that P::RD is getting confused with your very first line:
    command: ( oops /\s+/ word { "(oops '$item[2])" }
    I also had some problems doing "( rule | rule2 | rule3 ) rule4", I'd suggest trying splitting it up:
    command: subcommand ...!/\S/ subcommand: oops {action} | np {action} | ..
    BTW, you don't need to explicitly say that commands/words has any number of spaces between them, the parser assumes that anyway. (At least, mine did :)

    C.

      In response to making the subrule explicit: I'll try that, thank you.

      In response to your other suggestion: yes, whitespace will be skipped, but I want to force it to only match when there is at least one space in between. However, me being stupid, I did not notice that the default is not + but *; using + instead would probably do what I want.

Re: Parse::RecDescent eats large part of grammar, thinking it to be implicit subrule
by graff (Chancellor) on Jan 22, 2003 at 08:01 UTC
    Perhaps it would help if you tried to give a brief description of what your grammar is supposed to handle. Apart from that, it looks like the initial part of your grammar goes wrong in the trace, and there are some likely suspects in the first rule:

    • there is a strandard single-quote character just before $item[2]
    • you're using curly braces, which I think are supposed to be used for bracketing snippets of perl code, but there is no perl code inside them
    • there are parens around a set of "|"-conjoined elements, then some other stuff ouside the parens, which may just be uninterpretable.

    I'm already way over my head here -- to date, I've only looked at the PRD man page (I've never written code to use it), and have used lex/yacc only rarely, in a previous life, so one or more of the above items may be a false lead.

    Have you arrived at this grammar via a series of preliminary and intermediate steps, building it up from pieces that you have tried successfully? Or have you just created the whole thing from scratch, without testing any single component by itself, and you're now trying to debug the whole thing at once?

    Naturally, I'd recommend the former approach if you haven't tried it. Start with something small and constrained (but relevant) -- feed it with equally constrained input if that helps -- then build up incrementally; when you hit a snag, show us what you have, indicating which parts are known to be working, and what incremental piece introduced the snag.

      The single-quote character is part of a double-quoted string, the curly braces do have Perl code inside them, and the parens define an implicit subrule, which should be interpretable but apparently isn't. Thanks for taking a look anyway.

      As far as the intermediate steps, I created many subrules, then tested them individually first, which does approximately the same thing IINM.