http://www.perlmonks.org?node_id=883994

MeatLips has asked for the wisdom of the Perl Monks concerning the following question:

Here's a question for the wise monks... I have this single line string:

key1=val1 key2=val2 key3=val3 key4="val4a val4b" key5="val5key=(0 1 2 3)" key6=(val6a val6b)

I want to parse that into an array so I can have something simple like this:

foreach my $x (@array) { print "$x\n"; }

return the following:

key1=val1 key2=val2 key3=val3 key4="val4a val4b" key5="valkey=(0 1 2 3)" key6=(val6a val6b)

I've been tearing my hair out, poring over various regex texts, trying various ways to use 'split', anything to figure out how to split this pig up the way I want it. What would the wise monks here suggest?

Replies are listed 'Best First'.
Re: string parsing with split
by BrowserUk (Patriarch) on Jan 24, 2011 at 20:42 UTC

    Use a lookahead:

    $s = q[key1=val1 key2=val2 key3=val3 key4="val4a val4b" key5="val5key= +(0 1 2 3)" key6=(val6a val6b)];; @a = split " (?=key)", $s;; print for @a;; key1=val1 key2=val2 key3=val3 key4="val4a val4b" key5="val5key=(0 1 2 3)" key6=(val6a val6b)

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      This would seem to work if I literally had the word "key" in the string. I was using that as an example. The actual string has different words that are not the same.

      Here's something that looks a bit closer to the actual string:

      platform=linux hpfFamily=hwseries npsFamily=SWseries rackcnt=1 SPAs=1 SPUpSPA=6 CPUs=6 shrParts="/part1 /home/part2" maintIface="eth2" DEncpSPA=4 nDEncs=4 npsMName="SystemName" portnums="ports=(0 1 2 3)" ints=(eth0 eth1)
        Here's something that looks a bit closer to the actual string:

        Then here's something that may be a bit closer to a working solution for you?

        $s = q[platform=linux hpfFamily=hwseries npsFamily=SWseries rackcnt=1 +SPAs=1 SPUpSPA=6 CPUs=6 shrParts="/part1 /home/part2" maintIface="eth +2" DEncpSPA=4 nDEncs=4 npsMName="SystemName" portnums="ports=(0 1 2 3 +)" ints=(eth0 eth1)];; print for split ' (?=\w+=)', $s;; platform=linux hpfFamily=hwseries npsFamily=SWseries rackcnt=1 SPAs=1 SPUpSPA=6 CPUs=6 shrParts="/part1 /home/part2" maintIface="eth2" DEncpSPA=4 nDEncs=4 npsMName="SystemName" portnums="ports=(0 1 2 3)" ints=(eth0 eth1)

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: string parsing with split
by ikegami (Patriarch) on Jan 24, 2011 at 20:58 UTC
    Another way,
    my @pairs = / \G \s* ( [^=]+ = (?: " [^"]* " | \( [^)]* \) | \S* ) ) /xg;
Re: string parsing with split
by AnomalousMonk (Archbishop) on Jan 24, 2011 at 22:27 UTC

    Here's my take on the problem. It has the advantage, IMHO, of being more easily adaptable to changing requirements because it is more modular.

    Notes:

    • The regex uses  \x22 in place of  " (double-quote) to avoid Windoze command-line escape-ology.
    • A quoted string cannot contain any sort of double-quote, escaped or otherwise.
    • A parenthetic group cannot contain a  ')' (right-paren).
    (Sorry for the line-wrap.)

    >perl -wMstrict -le "my $s = 'key1=val1 key2=val2 key3=val3 key4=\"val4a val4b\" ' . 'key5=\"val5key=(0 1 2 3)\" key6=(val6a val6b)' ; ;; my $key = qr{ [[:alpha:]] [[:alnum:]]+ }xms; my $val = qr{ [[:alpha:]] [[:alnum:]]+ }xms; my $d_quo = qr{ \x22 [^\x22]* \x22 }xms; my $paren = qr{ [(] [^)]* [)] }xms; ;; my $vals = qr{ $val | $d_quo | $paren }xms; ;; my @opts = $s =~ m{ $key \s* = \s* $vals }xmsg; ;; print qq{'$s'}; print qq{'$_'} for @opts; " 'key1=val1 key2=val2 key3=val3 key4="val4a val4b" key5="val5key=(0 1 2 + 3)" key6=(val6a val6b)' 'key1=val1' 'key2=val2' 'key3=val3' 'key4="val4a val4b"' 'key5="val5key=(0 1 2 3)"' 'key6=(val6a val6b)'
Re: string parsing with split
by jethro (Monsignor) on Jan 24, 2011 at 20:57 UTC
    I think Text::CSV is a well known solution to such problems
      It's not up to the task. It can be configured to allow quoting to start in the middle of a field (loose quotes), but it doesn't support multiple quoting characters or balanced quotes (start quote character (e.g. "(") different than end quote character (")")).