Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Parsing CSV

by davido (Archbishop)
on Apr 03, 2014 at 17:12 UTC ( #1080975=note: print w/ replies, xml ) Need Help??


in reply to Parsing CSV

Wow, I wonder if anybody has ever had to deal with CSV that has been made non-trivial by quotes and embedded commas? I bet they have... Perl's been around a long time, after all. Maybe I should search CPAN to see if one of the 15,000 distributions there can be useful to me: Text::CSV, Text::CSV_XS.

Sample code...

use Text::CSV; my $csv = Text::CSV->new or die Text::CSV->error_diag; while( my $row = $csv->getline(\*DATA) ) { print scalar @$row, " columns: (", join( ') (', @$row ), ")\n"; } __DATA__ column1,column2,column3,column4,column5,column6,column7 data,data,"lname, fname mi","lname, fname mi","lname fname mi",data,da +ta data,data,"lname, fname mi","lname, fname mi",,data,data data,data,"lname, fname mi",,,data,data data,data,,,,data,data

Sample output:

$ ./mytest.pl 7 columns: (column1) (column2) (column3) (column4) (column5) (column6) + (column7) 7 columns: (data) (data) (lname, fname mi) (lname, fname mi) (lname fn +ame mi) (data) (data) 7 columns: (data) (data) (lname, fname mi) (lname, fname mi) () (data) + (data) 7 columns: (data) (data) (lname, fname mi) () () (data) (data) 7 columns: (data) (data) () () () (data) (data)

(Adapted from the SYNOPSIS section in Text::CSV's POD.)


Dave


Comment on Re: Parsing CSV
Select or Download Code
Re^2: Parsing CSV
by kumoparris (Initiate) on Apr 04, 2014 at 14:56 UTC

    Dave

    Thanks for the response. I am a bit new here and did look at Text::CSV, but couldn't figure out how to solve the entire issue. I was hoping to also remove the comma in the names in the same pass.

      Text::CSV won't do it directly, but if you're ok with just blindly removing commas, you can do this:

      use Text::CSV; my $csv = Text::CSV->new or die Text::CSV->error_diag; while( my $row = $csv->getline(\*DATA) ) { tr/,//d for @$row; print scalar @$row, " columns: (", join( ') (', @$row ), ")\n"; } __DATA__ column1,column2,column3,column4,column5,column6,column7 data,data,"lname, fname mi","lname, fname mi","lname fname mi",data,da +ta data,data,"lname, fname mi","lname, fname mi",,data,data data,data,"lname, fname mi",,,data,data data,data,,,,data,data

      ...which produces...

      s$ ./mytest.pl 7 columns: (column1) (column2) (column3) (column4) (column5) (column6) + (column7) 7 columns: (data) (data) (lname fname mi) (lname fname mi) (lname fnam +e mi) (data) (data) 7 columns: (data) (data) (lname fname mi) (lname fname mi) () (data) ( +data) 7 columns: (data) (data) (lname fname mi) () () (data) (data) 7 columns: (data) (data) () () () (data) (data)

      That seems like it's almost there. But if you need to reverse the order of "words" within the fields that have embedded commas, you could do this:

      use Text::CSV; my $csv = Text::CSV->new or die Text::CSV->error_diag; while( my $row = $csv->getline(\*DATA) ) { s/([^,]+),\s*([^,]+)/$2 $1/ for @$row; print scalar @$row, " columns: (", join( ') (', @$row ), ")\n"; } __DATA__ column1,column2,column3,column4,column5,column6,column7 data,data,"lname, fname mi","lname, fname mi","lname fname mi",data,da +ta data,data,"lname, fname mi","lname, fname mi",,data,data data,data,"lname, fname mi",,,data,data data,data,,,,data,data

      ...which puts "lname, fname mi" into "fname mi lname" order, providing the following results:

      $ ./mytest.pl 7 columns: (column1) (column2) (column3) (column4) (column5) (column6) + (column7) 7 columns: (data) (data) (fname mi lname) (fname mi lname) (lname fnam +e mi) (data) (data) 7 columns: (data) (data) (fname mi lname) (fname mi lname) () (data) ( +data) 7 columns: (data) (data) (fname mi lname) () () (data) (data) 7 columns: (data) (data) () () () (data) (data)

      FWIW, I am happy to deal with questions where a commonly used CPAN solution is falling short of what you need, but the original question made no mention of what solutions had been explored, or how they failed to meet the need. You tend to get warmer and far more useful responses if you provide us with details on what you tried, how it failed to fill the requirement, a small self-contained code example, sample input (as you did), and sample output.

      Often, in composing an effective question, the research that goes into its composition leads you to the answer without even asking. But in those cases where the answer is still elusive, you'll be able to demonstrate exactly what small part of the bigger picture is holding you up so that our answers hit the mark on the first try, rather than after a round or two of follow-ups (as has been the case here).

      I like giving good answers. But sometimes I simply can't devote the time because it's obvious that it's going to take several rounds of asking for additional information before we get to the meat of the problem. That's an unfortunate situation for the person asking, because his question doesn't get the quality of answers he needs, when it could have gotten a quick and accurate answer had it only provided more context.

      Anyway, welcome to the Monastery, and I hope you find it both helpful and encouraging to your efforts to learn Perl.


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1080975]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2014-11-26 03:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (160 votes), past polls