Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Splitting a line on just commas

by gibsonca (Beadle)
on Jun 13, 2010 at 18:31 UTC ( #844465=perlquestion: print w/ replies, xml ) Need Help??
gibsonca has asked for the wisdom of the Perl Monks concerning the following question:

I have a fixed, text comma delimited set of lines that sometimes has commas in comments in some of the fields, how do I ignore those? For example:

a,b,"hey, you","str1, str2, str3",end

So this should be split into 5 fields. But

@fields = split(/\,/, $line);

does not take into account the quoted strings. What I want fields to be set to:

a b "hey, you" "str1, str2, str3" end

Maybe I am missing the obvious, but I am stuck on this. Any help appreciated.

Comment on Splitting a line on just commas
Select or Download Code
Re: Splitting a line on just commas
by ikegami (Pope) on Jun 13, 2010 at 18:38 UTC
      @arr = split /(".+?")|,/, $s print @arr a b "hey,you" "str1, str2, str3" end
        That doesn't work. It actually returns
        ( 'a', undef, 'b', undef, '', '"hey, you"', '', undef, '', '"str1, str2, str3"', '', undef, 'end', )

        Keep in mind the first arg of split is a separator.

Re: Splitting a line on just commas
by CountZero (Bishop) on Jun 13, 2010 at 18:39 UTC
    One answer : Text::CSV

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Splitting a line on just commas
by BrowserUk (Pope) on Jun 13, 2010 at 19:37 UTC

    Text::CSV_XS but ... if your sample is indicative:

    $s = 'a,b,"hey, you","str1, str2, str3",end';; print for split ',(?=\S)', $s;; a b "hey, you" "str1, str2, str3" end

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      This of course fails in the following cases, because it's looking for the space after the comma:

      a,b,"hey,you",etc a, b, "hey, you", etc

      Which is why it was qualified with "if your sample is indicative". A more general solution can be found by focusing on the fields themselves, rather than the commas:

      @fields = $s =~ /("[^"]*"|[^,]*),/gc; $lastfield = $s =~ /\G(.*)/; push @fields, $lastfield;

      But even that has no provision for placing a quotation mark inside a quoted string, and I'm sure there are other things I missed. The problem is hairier than it looks, hence, Text::CSV or Text::CSV_XS is best.

        A more general solution can be found

        The phrase "more general" is similar to "a bit pregnant".

        There is little point in catering for one possibility not in evidence and not another. You should either cater for what is; or for every possibility.

        As noted on wikipedia, there is no single consistent standard for what constitute CSV or TSV etc. The module we both mentioned therefore jumps through extraordinary hoops to try and cater for every possible variation--and inevitably fails.

        But, individual sources of CSV output are usually self-consistent.

        Just as we don't take a universal phrase book covering every known language--were such available--with us when travelling to a particular country, it rarely makes sense to try and cater for non-evident possibilities unless you are going to cater for them all. You're either doing more work than necessary; or not enough.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Splitting a line on just commas
by Anonymous Monk on Jun 14, 2010 at 15:39 UTC
    use Text::ParseWords; my $line = q(a,b,"hey, you","str1, str2, str3",end); @words = &quotewords(',', 0, $line);
      ... and to preserve the quotes using Text::ParseWords, if desired:
      use strict; use warnings; use Text::ParseWords; my $line = q(a,b,"hey, you","str1, str2, str3",end); my @words = quotewords(',', 1, $line); print "$_\n" for @words; __END__ a b "hey, you" "str1, str2, str3" end
Re: Splitting a line on just commas
by reddydn (Initiate) on Jun 14, 2010 at 17:20 UTC
    Try this @arr = split /(".+?")|,/, $s ; print @arr;
Re: Splitting a line on just commas
by deMize (Monk) on Jun 14, 2010 at 17:34 UTC
    Response: I'd go the Text::CSV route, but this might help get you started
    use strict; sub main{ my $text = qq{a,b,"hey, you","str1, str2, str3",end}; print "Input: $text\n\n"; # Split the delimiters my @values = split( /(?:\,|(\".*?\"))/ , $text); # Remove the created blanks @values = grep{$_ ne ''} @values; # Output foreach (0..$#values){ print "$_: $values[$_] \n"; } } main();
    Output:
    Input: a,b,"hey, you","str1, str2, str3",end 0: a 1: b 2: "hey, you" 3: "str1, str2, str3" 4: end


    Thoughts: I haven't really thought why the blanks are being created - if you take away the grep, you'll see what I'm talking about. I still advise using Text::CSV because using this grep method will remove wanted blanks. Therefore, the above code has structural integrity problems.

    Example: a,b,,d,e
    You probably really want that space holder there if you're going to be inserting this into a database. The grep would remove it because it has a blank string value ("").


    Demize

      Ouch! Misusing split, which you attempt to fix by filtering out empty strings, which leads to warnings and the removal of empty fields.

      Thing being separated vvvvvvv /(?:\,|(\".*?\"))/ ^^ Separator

      How are those two things on equal footing?

        Response: I was about to say, you might want to remove all the undefined created by the unmatched parens, before removing the blank fields.
        @values = grep{defined} @values; @values = grep{$_ ne ''} @values;
        or
        @values = grep{defined && $_ ne ''} @values;
        Again, I would not use this method. It's not good to remove blank string values. As for the equal footing, would this be any less equal: /(?:\,)|(\".*?\")/

        Update: I did forget to include the trailing comma after the quotes, but I still wouldn't use it: /\,|(?:(\".*?\")\,)/


        Demize
Reaped:
by NodeReaper (Curate) on Jun 16, 2010 at 17:41 UTC
Re: Splitting a line on just commas
by furry_marmot (Pilgrim) on Jun 16, 2010 at 22:24 UTC

    Easy peasy.

    $s = 'a,b,"hey, you","str1, str2, str3",end'; push @fields, $1 while $s =~ /("[^"]+"|[^",]+)(?:,|$)/g; print "$_\n" for @fields;
    or
    $s = 'a,b,"hey, you","str1, str2, str3",end'; @fields = $s =~ /("[^"]+"|[^,]+)(?:,|$)/g; # Use +, not *, or you get + a blank element print "$_\n" for @fields;

    Both print:

    a b "hey, you" "str1, str2, str3" end

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://844465]
Approved by planetscape
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2014-12-22 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (117 votes), past polls