Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Parsing using m//g

by pbeckingham (Parson)
on Sep 25, 2006 at 15:48 UTC ( #574763=perlquestion: print w/replies, xml ) Need Help??
pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

Can someone help? I have given myself the challenge of doing some simple parsing, but in a complex way. Without focusing on why I choose to do this, can someone guide me towards a viable solution? Given the following input:

name1=value1 name2 = value2
This code parses it:
while (<$input>) { chomp; next if /^ \s* #/; next if /^ \s* $/; if (/^ \s* ([^=\s]+) \s* = \s* (.+) $/x) { # name is in $1, value is in $2 } }
That's not the question though. The question is, how would I parse the following:
name1=value1 name2 = value2 name3 = value3 but wait, there is more name4= value4
With Perl that has the form:
my $contents = do {local $/; <$input>}; while ($contents =~ / ANSWER_HERE /msg) { # name is in $1, value is in $2 }
Specifically, I want to use the //g form, to iterate over the string, and not perform a line-by-line parse, as in the first example. My attempts have thus far failed. The closest I got (without success) was:
my $contents = do {local $/; <$input>}; my $name = qr/\s* [^=\s]+ \s*/x; while ($contents =~ /^ ($name) = \s* (.+) (?= ^ $name = | $ ) /msg +x) { # name is in $1, value is in $2 }

pbeckingham - typist, perishable vertebrate.

Replies are listed 'Best First'.
Re: Parsing using m//g
by ikegami (Pope) on Sep 25, 2006 at 16:00 UTC
    my $contents = do { local $/; <DATA> }; while ($contents =~ / \s* ([^=\s]+) \s* = \s* ( (?: (?! \s* (?: [^=\s]+ \s* = | $ ) ) . )* ) /xmsg ) { print("[$1 => $2]\n"); } __DATA__ name1=value1 name2 = value2 name3 = value3 but wait, there is more name4= value4


    [name1 => value1] [name2 => value2] [name3 => value3] [name4 => value4]

    Update: The above works by never allowing bad data in the value. The following is an alternate solution that works by starting with an empty value, and extending it as much as possible.

    my $contents = do { local $/; <DATA> }; while ($contents =~ / \s* ([^=\s]+) \s* = \s* (.*?) # Extend the value. (?= \s* (?: [^=\s]+ \s* = | $ ) ) /xmsg ) { print("[$1 => $2]\n"); }

      To be correct, your output would have to be:

      [name1 => value1] [name2 => value2] [name3 => value3 but wait, there is more] [name4 => value4]

      pbeckingham - typist, perishable vertebrate.

        Simply change /.../xmsg to /.../xsg.
        Simply change $ to \z.

        Update: Added the second (and better) option.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://574763]
Approved by herveus
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2016-10-23 19:17 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (302 votes). Check out past polls.