Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parse::RecDescent trouble

by ribasushi (Pilgrim)
on Jan 12, 2007 at 12:06 UTC ( [id://594364]=perlquestion: print w/replies, xml ) Need Help??

ribasushi has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I am trying to dive into Parse::RecDescent madness, and here is where I am stuck. Some sample code:
use strict; use warnings; use Parse::RecDescent; use Data::Dumper; $::RD_AUTOACTION = q { [@item] }; my $parse = Parse::RecDescent->new (q { config: section(s) /\z/ section: server | key server: 'bind-server' ip { [@item[0,2]] } ip: ascii_byte '.' ascii_byte '.' ascii_byte '.' ascii_byte { join + ('', @item[1..7]) } ascii_byte: /\d+/ <reject: do{$item[1] > 255}> { $item[1] } key: 'tsig-key' filename { [@item[0,2]] } filename: '"' /[^"]+/ '"' { $item[2] } | /[^"\s]+/ { $item[1] } }); print Dumper $parse->section (' bind-server 127.0.0.1 tsig-key "/etc/bind/rndc.key" ');
It gives me
$VAR1 = [ 'section', [ 'server', '127.0.0.1' ] ];
like the second parsed line does not exist.
Thank you for your help. Peter

Replies are listed 'Best First'.
Re: Parse::RecDescent trouble
by davorg (Chancellor) on Jan 12, 2007 at 12:33 UTC
    print Dumper $parse->section (' bind-server 127.0.0.1 tsig-key "/etc/bind/rndc.key" ');

    You're asking it to parse a section. You probably want to parse a config.

    print Dumper $parse->config (' bind-server 127.0.0.1 tsig-key "/etc/bind/rndc.key" ');
    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I will go now bang my head repeatedly against a wall.
Re: Parse::RecDescent trouble
by ikegami (Patriarch) on Jan 12, 2007 at 18:01 UTC
    I hope you don't mind if I go beyond what you asked about. (In fact, I won't even touch that since it's already been addressed.)
    • Your parser accepts 127.   0.    0    .1 as a valid ip.
    • Your parser accepts 127.newline0.newline0newline.1 as a valid ip.
    • Your parser accepts 127.00000000000000.000000000000000.000000001 as a valid ip.
    • Your parser accepts "newlineabcnewline" as a valid filename.
    • Your parser returns abc for filename "   abc   ".
    • Your parser accepts ip127.0.0.1 and tsig-keyserver.
    • Your parser accepts bind-server 127.0.0.1 tsig-key "/etc/bind/rndc.key" all on one line. I'm pretty sure you don't want that.
    • Using q{} around your grammar can easily lead to weird problems with slashes. You should double your slashes (yuck and easy to miss one) or use here-docs.
    • <reject: do { ... }> can be simplified to <reject: ...>.
    • strict and warnings aren't on for your actions and the parser in general.
    • The return value for config shouldn't include $item[2].
    • It's hard to see alternate rules. I line up the |s with the :s. (I also line up the :s, but that's personal preference.)
    • What if the file name contains a "?
    • I removed $::RD_AUTOACTION = q { [@item] };. It was causing extra code, not less.
    • For the quoteless filename, it would be better if you specified which character *are* allowed.
    use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $config_parser = Parse::RecDescent->new(<<'__END_OF_GRAMMAR__'); { # These pragmas affect the whole parser. use strict; use warnings; sub check_ip_nums { my ($ip) = @_; return !(grep $_ > 255, split /./, $ip); } sub dequote { my ($s) = @_; for ($s) { s/^"//; s/"\z//; s/\\(.)/$1/sg; return $_; } } } parse : line(s) /\Z/ { $item[1] } line : '' # Skip blank lines. <skip:'[ \\t]*'> # Don't treat newlines as whitespac +e. key_value /\n/ <skip: $item[2]> { $item[3] } key_value : server | key server : IDENT { $item[1] eq 'bind-server' } IP { [@item[0,3]] + } key : IDENT { $item[1] eq 'tsig-key' } filename { [@item[0, +3]] } filename : QSTRING | BAREWORD # Tokens IDENT : /[-\w]+/ QSTRING : /"(?:[^"\\]|\\.)*"/ { dequote($item[1]) } BAREWORD : /[^"\\\s]+/ IP : # This could be done more readably, but # the more is done by the regexp, the # faster it's going to be. A lot faster. /(?:[1-9][0-9]{0,2}|0)\.(?:[1-9][0-9]{0,2}|0)\.(?:[1- +9][0-9]{0,2}|0)\.(?:[1-9][0-9]{0,2}|0)/ { check_ip_nums($item[1]) ? $item[1] : undef } __END_OF_GRAMMAR__ print Dumper $config_parser->parse(<<'__END_OF_CONFIG__'); bind-server 127.0.0.1 tsig-key "/etc/bind/rndc.key" __END_OF_CONFIG__
      I absolutely don't mind. Actually I am extremely happy I got an answer like this. Thank you a ton, it is full of very helpful advices. Particularly I had no idea I can add a closure to the grammar and use it as part of a virtual main package (I did not find it nowhere in the docs).
      I have an additional question if you do not mind. Can you decipher this:
      line : '' # Skip blank lines. <skip:'[ \t]*'> # Don't treat newlines as whitespace +. key_value /\n/ <skip: $item[2]> { $item[3] }
      for me please? I particularly do not understand the '' construct (it will always match right?) neither do I understand how can you have several tokens in one rule without the | mark (you have '' then a skip pragma then key_value then /\n/ and then another skip pragma)
      Once again thanks a lot for the insights!

        Particularly I had no idea I can add a closure to the grammar and use it as part of a virtual main package

        It's not really a closure. The block is inlined at the start of the generated parser code. You should check out Grammar.pm after executing:

        use Parse::RecDescent my $grammar = ...; Parse::RecDescent->Precompile($grammar, "Grammar");

        That block is documented as Start-up Actions.

        I particularly do not understand the '' construct. it will always match right?

        Yes, but remember that P::RD removes /$skip/ from the input before every terminal in the grammar. The current value of skip is '\\s*', so '' removes all leading whitespace.

        That whole thing allows blank lines between key_value, but not within key_value.

Re: Parse::RecDescent trouble
by ferreira (Chaplain) on Jan 12, 2007 at 12:26 UTC
    Well, it is doing what you told it to. This rule
    section: server | key
    says you have section which have either a server or a key and no more. So it stops as soon as it parses one of these. If you exchange the order of the lines, as in:
    print Dumper $parse->section (' tsig-key "/etc/bind/rndc.key" bind-server 127.0.0.1 ');
    you're gonna see it parses the tsig-key line correctly but ignores the next one. You need something like that:
    section: (server | key)(s)
    with some extras to get the data structure you want.

    Update: the real problem was spotted by davorg in Re: Parse::RecDescent trouble$parse->section was being used instead of $parse->config. I've got confused by the rule name "section" to hold a single part ("key" or "server"). I expected a config to contain sections and sections to contain (possibly) multiple parts ("key" or "server").

      Erm isn't this what secion(s) above does? Saying that it should match a number of SECTION while any SECTION can be either SERVER or KEY?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://594364]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-23 21:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found