Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Modifying Parse::RecDescent Grammar to deal with multiline property file entries

by chahn (Beadle)
on Aug 15, 2007 at 19:47 UTC ( [id://632838]=perlquestion: print w/replies, xml ) Need Help??

chahn has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I need to parse property files that might have multiline entries (otherwise a simple regexp would do) and I need to process comments and maintain order (otherwise Config::Properties would do)

I thought to use a positive lookahead to define a var=val pair to start with ^var= and end just before the next ^var=

I am probably needing a nudge in using either the skip directive, or the proper pattern modifier....but here is sample code:

use strict; use Parse::RecDescent; $::TestGrammar = <<'TG'; Output: PropLine(s) /\Z/ PropLine: CommentLine | SimpleProp | LastProp CommentLine: /\#.*\n/ { print "RULE: $item{__RULE__}\n"; print "MATCH: $item{__PATTERN1__}\n"; } SimpleProp: VAR EQ VAL ...LastProp { print "RULE: $item{__RULE__}\n" +; print "VAR: $item{VAR}\n"; print "EQ: $item{EQ}\n"; print "VAL: $item{VAL}\n\n"; } LastProp: VAR EQ VAL { print "RULE: $item{__RULE__}\n"; print "VAR: $item{VAR}\n"; print "EQ: $item{EQ}\n"; print "VAL: $item{VAL}\n\n"; } VAR: /[^=]+/ EQ: '=' VAL: /.*/ TG undef $/; my $foo = <>; my $parser = Parse::RecDescent->new($::TestGrammar); defined $parser->Output($foo) or die "FAILURE"; __END__ When I run this using this file: =================== # Comment Line # Comment #2 foo=this is property one but bar=does it grab this one too? baz=snark =================== I see this, as expected: =================== RULE: CommentLine MATCH: # Comment Line RULE: CommentLine MATCH: # Comment #2 RULE: SimpleProp VAR: foo EQ: = VAL: this is property one but RULE: SimpleProp VAR: bar EQ: = VAL: does it grab this one too? RULE: LastProp VAR: baz EQ: = VAL: snark =================== However, if I make one of the props multiline: =================== # Comment Line # Comment #2 foo=this is property one but bar=does it grab this one too? baz=snark =================== I see this: =================== RULE: CommentLine MATCH: # Comment Line RULE: CommentLine MATCH: # Comment #2 RULE: SimpleProp VAR: foo EQ: = VAL: this is property <----the rest of this VAL becomes part of the next VAR. no joy. RULE: SimpleProp VAR: one but bar EQ: = VAL: does it grab this one too? RULE: LastProp VAR: baz EQ: = VAL: snark ===================

I am trying things like changing VAR: to /^[^=]+/m and such but have yet to find the right combination.
Thank you in advance for any comments you can make.

Replies are listed 'Best First'.
Re: Modifying Parse::RecDescent Grammar to deal with multiline property file entries
by suaveant (Parson) on Aug 15, 2007 at 20:33 UTC
    This seems to work... (your LastProp, in this context, was superflous)

    Update Kind of works, but I realized it allows continuation lines before a prop

    use strict; use Parse::RecDescent; $::TestGrammar = <<'TG'; Output: PropLine(s) /\Z/ PropLine: CommentLine | SimpleProp | ContinuationLine CommentLine: /\#.*\n/ { print "RULE: $item{__RULE__}\n"; print "MATCH: $item{__PATTERN1__}\n"; } SimpleProp: VAR EQ VAL { print "RULE: $item{__RULE__}\n"; print "VAR: $item{VAR}\n"; print "EQ: $item{EQ}\n"; print "VAL: $item{VAL}\n\n"; } VAR: /^[^=\n]+/ EQ: '=' VAL: /.+/ ContinuationLine: VAL { print "RULE: $item{__RULE__}\n"; print "VAL: $item{VAL}\n\n"; } TG undef $/; my $foo = <DATA>; my $parser = Parse::RecDescent->new($::TestGrammar); defined $parser->Output($foo) or die "FAILURE"; __DATA__ # Comment Line # Comment #2 foo=this is property one but bar=does it grab this one too? baz=snark

                    - Ant
                    - Some of my best work - (1 2 3)

      Thank you for the suggestion.

      I need to work on each property as a whole, but could still piece together your ContinuationLine with its corresponding SimpleProp....

      In any case, what I was hoping to express was that a property is everything from a line beginning with Var=Val and continuing up to, but not including, the next line beginning with Var=Val. This seems a likely spot for a look-ahead.

      I am going to brute force the situation, using regexps and flags to indicate when in a Property, but do hope to poke on this when I can. I will certainly post back any useful conclusions.
        yeah you could do something like
        /.*(?=(^[^=]+=|\Z))/ # not tested, but should be about right
        Or you could use the code... set a flag when you find a propname and unset only except continuation lines when it is set. If you need to look at the prop as one piece just build a parse tree... that usually is the better way to go unless you are working on a huge dataset or a real time stream. And, as you say, you could pop it back together in simpleprop.

        Of course.. unless you need them it is often easier to pre-process out the comments.

                        - Ant
                        - Some of my best work - (1 2 3)

Re: Modifying Parse::RecDescent Grammar to deal with multiline property file entries
by princepawn (Parson) on Aug 17, 2007 at 07:51 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://632838]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-23 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found