http://www.perlmonks.org?node_id=649721

jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

I need to split a string which more or less looks like
AAA * BBB CCC * * "2000 01 00 00 00" "2004 01 00 00 00" or AAA * BBB "CCC DDD" * * "2000 01 00 00 00" *
The firt example should split into
* BBB CCC * * 2000 01 00 00 00 2004 01 00 00 00
Note that the first value of the string is ommitted (not important value).
Here is my test code
#! /usr/bin/perl use strict ; use warnings ; my $inp = 'AAA * BBB CCC * * "2000 01 00 00 00" "2004 01 00 00 00"' ; my @r = $inp =~ /"([^"]*)"|\s+([^\s]*)(?:\s|$)/g ; local $" = "\n"; print "@r";
Output:
Use of uninitialized value in join or string at ./t9.pl line 13. ...... 6 more lines like this ..... * CCC * 2000 01 00 00 00 "2004 00 00"
Not exactly what I had in mind. Any suggestions about what I'm doing wrong here ?

Thnx a lot
LuCa

UPDATE: thnx!! Result: @r = map { s/"//g; $_ }, $inp =~  /("[^"]*"|\S+)/g ;

Replies are listed 'Best First'.
Re: split string using regex
by duff (Parson) on Nov 08, 2007 at 14:51 UTC

    You have two sets of parentheses in your regular expression that are mutally exclusive. Perl is putting the contents of both $1 and $2 into your array. Since one of those will be undef, perl complains. Rewrite your RE such that there is only one set of parentheses or be sure to discard the undefs.

    Update: here's one possibility:

    #!/usr/bin/perl use strict; use warnings; my $inp = 'AAA * BBB CCC * * "2000 01 00 00 00" "2004 01 00 00 00"'; my @r = $inp =~ /("[^"]*"|\S+)/g; s/\A"(.*?)"\z/$1/ for @r; $, = "\n"; print @r, "\n";
Re: split string using regex
by grep (Monsignor) on Nov 08, 2007 at 15:01 UTC
    It appears that your data is space separated. I would recommend saving yourself a lot of trouble, grab Text::CSV_XS and change the sep_char to 0x20 for space. It recognizes strings that are quoted as one unit.

    grep
    One dead unjugged rabbit fish later...

Re: split string using regex
by gamache (Friar) on Nov 08, 2007 at 15:13 UTC
    my $inp = 'AAA * BBB CCC * * "2000 01 00 00 00" "2004 01 00 00 00"' ; my @r; while ($inp =~ /( [^"\s]+ | "[^"]+?" ) /gx) { my $a = $1; $a =~ s/(?:^"|"$)//g; push @r, $a; } print join "\n", @r, '';
    Do you really not want to match AAA? I assumed you want AAA in there; if not, shift @r or something. This solution isn't the most efficient, but I think it's clear enough to expand upon (or abandon in favor of Text::CSV_XS).
Re: split string using regex
by tuxz0r (Pilgrim) on Nov 08, 2007 at 17:24 UTC
    I think an easy approach would just be to replace those spaces in the datetime stamps with another character, then parse the fields, then you can go back in and replace the spaces if needed. For example,
    use Data::Dumper; my $inp = 'AAA * BBB CCC * * "2000 01 00 00 00" "2004 01 00 00 00"'; $inp =~ s/(\d) (\d)/$1-$2/g; my @fields = split /\s+/, $inp; print Dumper(\@fields);
    Which outputs
    $VAR1 = [ 'AAA', '*', 'BBB', 'CCC', '*', '*', '"2000-01-00-00-00"', '"2004-01-00-00-00"' ];
    However, this assumes that the other fields don't start/end with a number. I'm only going off your example above.

    ---
    echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'
    Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.

      I have below log line,

      <190>date=2005-05-25 time=07:17:21 device_id=FGT1002104201869 log_id=0316096002 type=webfilter subtype=urlexempt pri=information vd=root user=bpdcad\schiavok src=192.168.3.70 sport=3557 dst=207.68.177.125 dport=80 service=http hostname=h.msn.com url=/c.gif?RF=http%3a%2f%2fby101fd%2ebay101%2ehotmail%2emsn%2ecom%2fcgi%2dbin%2fHoTMaiL&PI=44364&DI=7474&PS=74565 status=allow msg="URL is allowed because it is in URL exempt-list"

      I want to split the above line with space and store into an array, but dont want to split if content is in between "" quotes say msg="URL is allowed because it is in URL exempt-list" i want above content as single content and should not divide the content under " quotes as separately. any help would be appreciated... Thanx. jai...

        Try this:

        while( $line =~ /(\w+)=("[^"]+"|\S+)/g ) { print "$1: $2\n"; }

        Warning: no guarantees if your string is not well-formed.