Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
There's more than one way to do things
 
PerlMonks  

Parsing using m//g

by pbeckingham (Parson)
on Sep 25, 2006 at 11:48 UTC ( [id://574763]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

pbeckingham has asked for the wisdom of the Perl Monks concerning the following question:

Can someone help? I have given myself the challenge of doing some simple parsing, but in a complex way. Without focusing on why I choose to do this, can someone guide me towards a viable solution? Given the following input:

name1=value1 name2 = value2
This code parses it:
while (<$input>) { chomp; next if /^ \s* #/; next if /^ \s* $/; if (/^ \s* ([^=\s]+) \s* = \s* (.+) $/x) { # name is in $1, value is in $2 } }
That's not the question though. The question is, how would I parse the following:
name1=value1 name2 = value2 name3 = value3 but wait, there is more name4= value4
With Perl that has the form:
my $contents = do {local $/; <$input>}; while ($contents =~ / ANSWER_HERE /msg) { # name is in $1, value is in $2 }
Specifically, I want to use the //g form, to iterate over the string, and not perform a line-by-line parse, as in the first example. My attempts have thus far failed. The closest I got (without success) was:
my $contents = do {local $/; <$input>}; my $name = qr/\s* [^=\s]+ \s*/x; while ($contents =~ /^ ($name) = \s* (.+) (?= ^ $name = | $ ) /msg +x) { # name is in $1, value is in $2 }



pbeckingham - typist, perishable vertebrate.

Replies are listed 'Best First'.
Re: Parsing using m//g
by ikegami (Patriarch) on Sep 25, 2006 at 12:00 UTC
    my $contents = do { local $/; <DATA> }; while ($contents =~ / \s* ([^=\s]+) \s* = \s* ( (?: (?! \s* (?: [^=\s]+ \s* = | $ ) ) . )* ) /xmsg ) { print("[$1 => $2]\n"); } __DATA__ name1=value1 name2 = value2 name3 = value3 but wait, there is more name4= value4

    Ouputs

    [name1 => value1] [name2 => value2] [name3 => value3] [name4 => value4]

    Update: The above works by never allowing bad data in the value. The following is an alternate solution that works by starting with an empty value, and extending it as much as possible.

    my $contents = do { local $/; <DATA> }; while ($contents =~ / \s* ([^=\s]+) \s* = \s* (.*?) # Extend the value. (?= \s* (?: [^=\s]+ \s* = | $ ) ) /xmsg ) { print("[$1 => $2]\n"); }

      To be correct, your output would have to be:

      [name1 => value1] [name2 => value2] [name3 => value3 but wait, there is more] [name4 => value4]



      pbeckingham - typist, perishable vertebrate.

        Simply change /.../xmsg to /.../xsg.
        and/or
        Simply change $ to \z.

        Update: Added the second (and better) option.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://574763]
Approved by herveus
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.