Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

parsing csv with Text::ParseWords

by GertMT (Friar)
on Oct 08, 2009 at 19:57 UTC ( #800083=perlquestion: print w/ replies, xml ) Need Help??
GertMT has asked for the wisdom of the Perl Monks concerning the following question:

dear Monks,

While reading a csv-file I've had some trouble using Text::CSV_XS for a reason that might have to do with eol (re. my post from y'day where Tux and ELISHEVA give me valuable info). That's why I'm trying to use Text::ParseWords but I just can't get to the final stage and was wondering if someone could help me.

I know that there's a lot of 'fields' as I counted them. While trying to print them in an orderly way I get all of them (not orderly and commented out) or just the value of the elements from the first line.

The last line of code obviously doesn't make sense. The fields should be coming from 'every single line', (@lines? and not @fields?)

Needless to say I'm not able to use the defined variables.
What am I doing wrong?

I did check some docs on refences and so but all is a bit confusing now.

thanks, Gert
#!/usr/bin/perl use warnings; use strict; use diagnostics; use Text::ParseWords; my $csvfilename = "data.csv"; open( FILE, "<$csvfilename" ) or die("Couldn't open CSV file $csvfilename:$!\\n"); # Read lines from STDIN. my $line = (); my @fields = (); my $field = (); while ( $line = <FILE> ) { @fields = &quotewords( ',', 0, $line ) or ( warn "problem on line $.:$_" ); # Set variable values based on the array values. my $id = $fields[0]; my $brand = $fields[1]; my $dbt = $fields[2]; my $cdt = $fields[3]; my $color = $fields[4]; my $number = $fields[5]; # print "@fields\n"; my $arraySize = $#fields + 1; print "array size = $arraySize\n"; print "array size = ", @fields . "\n"; print __LINE__ + 1 . ": I'm in trouble here...\n"; foreach $line (@fields) { print "$fields[0]\t$fields[1]\n"; } }

Comment on parsing csv with Text::ParseWords
Download Code
Re: parsing csv with Text::ParseWords
by Marshall (Prior) on Oct 08, 2009 at 22:06 UTC
    I don't have Text::ParseWords installed on my system now. but tried a re-formulation of your code (untested), yes this last line is trouble!
    #!/usr/bin/perl use warnings; use strict; use diagnostics; use Text::ParseWords; my $csvfilename = "data.csv"; open( FILE, "<", $csvfilename" ) or die("Couldn't open CSV file $csvfilename:$!\n"); while ( my $line = <FILE> ) { my @fields = quotewords( ',', 0, $line ); #no & needed! #print will bomb with undef value later if problem #or ( warn "problem on line $.:$_" ); #but maybe you want something like this... warn ("less than 6 things on $line") if @fields <6; my ($id,$brand,$dbt,$cdt,$color,$number) = (@fields)[0..5]; #my ($id,$brand) = (@fields)[0,1]; if all you need is id and brand # print "@fields\n"; #prints all fields space separated print join("\t",@fields),"\n"; #now with tab sparated print "array size = ", scalar(@fields) , "\n"; print "$id\t$brand\n"; }
    Update:
    So the short version would be like:
    #!/usr/bin/perl -w use strict; use diagnostics; use Text::ParseWords; my $csvfilename = "data.csv"; open( FILE, "<", $csvfilename" ) or die("Couldn't open CSV file $csvfilename:$!\n"); while ( my $line = <FILE> ) { my @fields = quotewords( ',', 0, $line ); #no & before #quotewords() needed! my ($id,$brand,) = (@fields)[0,1]; print "$id\t$brand\n"; }

      thank you for the two scripts. Especially the first one really helps me to better understand how to print the various elements of the array.

      the EOL problem remained. I'm only achieving a structured output after opening/saving the 'data.csv'. I'll probably look for another solution (ask different data-file).

      Gert
Re: parsing csv with Text::ParseWords
by Bloodnok (Vicar) on Oct 09, 2009 at 01:05 UTC
    As I didn't partake of the original discussion to which you allude, the question that immediately raises its head for me is: Why not use one of the CSV parsers, in particular (and the regulars can guess what's coming now:-), Text::xSV.

    A user level that continues to overstate my experience :-))
      The CSV parsers gave a EOL problem. Thanks, I'll try and see if Text::xSV can help me.
        The EOL in Perl source is just "\n". On Unix that literally is all that there is. On Windows, there is a \r\n sequence. I would assume that you don't need to specify EOL for your CSV module, just let Perl do its default thing, eg leave EOL => 'x' off! And let default do its work.

        I have moved files between Unix and Windows and Perl can read files created in either place. When I save a file under Unix, EOL is just \n. When saved under Windows it is \r\n. The Windows Perl can read the Unix Perl's file and vice-versa. My normal text editor, TextPad can do the same thing.

        If you process a Windows file that came from a Unix system, when Perl writes it, it will put in the \r\n sequence for Perl "\n". When the Unix Perl writes a file that came from Windows, it just puts in \n instead of \r\n.

        So in Perl: print "qwerty\n"; the \n may be 2 characters depending upon which OS you are running Perl under.

        If you could explain this problem more with an example, that would be helpful. This is a well-known common problem.

        I don't know the full history of why Windows did it this way. But in ancient mechanical paper tape days, each line ended with "carriage return(\r), line feed(\n), rubout(del). The teletype machine was dumb and need the \r to return print head to the next line and \n to advance the paper. The rubout (all 8 positions punched) was to keep mechanical fingers lubricated via the oil on the tape. The ASR 33 teletype was a "dodo bird" by even the time of DOS. Anyway this EOL problem is well known and there are solutions.

Re: parsing csv with biterscripting
by JenniC on Dec 09, 2009 at 16:21 UTC
    Here is a sample script you may possibly find useful for parsing csv files - http://www.biterscripting.com/SS_CSV.html. The scripting language is NOT perl - I am posting it since some users may find it simpler for parsing CSV files.
      I heard that biterscripting makes you stoopid. Is that true, Jenni?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://800083]
Approved by AnomalousMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2014-09-18 04:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (108 votes), past polls