Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Parsing "=" separated output

by blindluke (Hermit)
on Aug 03, 2011 at 06:46 UTC ( #918178=perlquestion: print w/replies, xml ) Need Help??
blindluke has asked for the wisdom of the Perl Monks concerning the following question:

Hello, enlightened Monks!

I have a string, that looks a bit like this:

"sometrash key1=value0 value1, value2 key2=value3 key3=value4"

I want to extract keys and values to a hash, like this one:

{ key1 => 'value0 value1, value2', key2 => 'value3', key3 => 'value4', }

I used something like the code below, but for key1 it only gives me 'value0' - I thought there wouldn't be any whitespace in the values, and I was wrong :)

my %data; while ($string =~ m/\s+([^\s]+?)=([^\s]*)/g) { $data{$1} = $2; }

How can I modify my regexp, so that it only treats a single word right before the '=' sign as the key, and all the characters up until next key (or end of string) are treated as the value?

Regards, Luke Jefferson

Replies are listed 'Best First'.
Re: Parsing "=" separated output
by Ratazong (Monsignor) on Aug 03, 2011 at 07:14 UTC

    Hi Luke!

    I assume your difficulty is to determine when the data ends and when the next key is coming. You could solve it with a look-ahead:


    This looks for a blank and then some word followed by an = - according to your example that is your next key.

    Unfortunately, with the look-ahead you would miss the last entry in your example, as there is no next key. But that can be solved easily by adding some dummy extension to the original string. So the following code works for me:

    my $string = "sometrash key1=value0 value1, value2 key2=value3 key3=v +alue4"; $string .= " dummy="; # add a dummy value for the rege +x to work while ($string =~ m/\s+([^\s]+?)=(.*?)(?=\s[^\s]+=)/g) { $data{$1} = $2; }
    HTH, Rata

    PS.: please also read the node Death to Dot Star! to understand the risk in my altered solution


      I've modified the regexp a bit, so that the concatenation is no longer necessary

      while ( $string =~ m/\s+([^\s]+?)=(.*?)(?=\s[^\s]+=|$)/g )

      Works quite well.


        Same idea but taking advantage of the wantarray response of a match and commenting the regex
        ~/$ perl -e ' $string="sometrash key1=value0 value1, value2 key2=value3 key3=value4" +; %hash = $string =~/ (\w+) #capture the key name = # separated from the value by an equals (.+?) # and then the value, non-greediness prevents running + into the subsequent values (?=(?:\s\w+=|$)) # finally we look ahead to ensure that what follows i +s a space and a "key=" pattern, or else the end end of the string /xg; #Match globally and allow comments in the regex for m +aintainability # And check use Data::Dumper;print Dumper(\%hash); ' $VAR1 = { 'key2' => 'value3', 'key1' => 'value0 value1, value2', 'key3' => 'value4' };
        print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      Unfortunately, with the look-ahead you would miss the last entry in your example

      Just put an alternation in the look-ahead to cope with end of string.

      knoppix@Microknoppix:~$ perl -MData::Dumper -Mstrict -wE ' > my $str = > q{sometrash key1=value0 value1, value2 key2=value3 key3=value4}; > my %data = $str =~ m{([^\s=]+)=(.*?)(?=(?:\s+[^\s=]+=|\z))}g; > print Data::Dumper->Dumpxs( [ \ %data ], [ qw{ *data } ] );' %data = ( 'key2' => 'value3', 'key1' => 'value0 value1, value2', 'key3' => 'value4' ); knoppix@Microknoppix:~$

      I hope this is of interest.

      Update: I just realised this is in essence exactly the same as Utilitarian's reply. Please ignore.



Re: Parsing "=" separated output
by egga (Monk) on Aug 03, 2011 at 07:16 UTC

    Hi! I don't like putting too much brain into regexes, so I wrote a bit more code. But I think it's gonna be easier to maintain afterwards.

    my $str = 'sometrash key1=value0 value1, value2 key2=value3 key3=value +4'; my @tokens; for my $token (split(" ", $str)) { if ($token =~ m/=/) { push @tokens, $token; } else { next unless @tokens; $tokens[-1] .= ' ' . $token; } } my %tokens = map { split("=", $_) } @tokens;
      #!/usr/bin/perl -- use strict; use warnings; use Data::Dumper; Main( @ARGV ); exit( 0 ); sub Main { my $str = q[sometrash key1=value0 value1, value2 key2=value3 key3= +value4]; my @tokens = split /([^\s=]+=)/, $str; shift @tokens until $tokens[0] =~ /=$/; print Dumper( { @tokens } ); } __END__ $VAR1 = { 'key3=' => 'value4', 'key2=' => 'value3 ', 'key1=' => 'value0 value1, value2 ' };
        Probably better to stop looping when run out of tokens :)
        shift @tokens until not(@tokens) or $tokens[0] =~ /=$/;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://918178]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2018-05-27 05:49 GMT
Find Nodes?
    Voting Booth?