Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic


by TStanley (Canon)
on Jan 11, 2001 at 02:27 UTC ( #51001=modulereview: print w/ replies, xml ) Need Help??

Item Description: Manipulation routines for comma-separated values

Review Synopsis:

Author: Alan Citterman

I had a project where I needed to extract data from a file and send
it to a customer. The file in question was from a database, and it
had been exported to a CSV text file.

I would have tried to write my own regular expression to handle this,
but my overall knowledge of Perl isn't that good. However, after some
research, I found a reference to this module.

#!/usr/bin/perl use strict; use Text::CSV;
I knew that the text file had lines of data that I didn't need, and
that there was an easily recognizable pattern in those lines, so I could
use a regular expression to put those lines into a trash file.

my $input="input.csv"; my $output="output.txt"; my $trash="trashfile"; my $csv=Text::CSV->new(); #Creates a new Text::CSV object open(INFILE,$input) || die "Can't open file $input"; open(OUTFILE,">$output") || die "Can't open file $output"; open(TRASH,">$trash") || die "Can't open file $trash";

Now to start reading the data from the file, store it in the $_ variable
and print it to the trash file if its not good, or parse the variable, and
print it to the output file if it is.

while (<INFILE>) { if (/"X"/) { #The trash data has these 3 characters in it print TRASH "$_\n"; } else { #Now to deal with the data I want to keep if($csv->parse($_)) { #checks to see if data exists in $_ and +parses it if it does my @fields=$csv->fields; # puts the values from each field in an +array my $elements=@fields; #gets the number of elements in the arra +y for ($x=0;$x<$elements;$x++) { print OUTFILE "$fields[$x]\t"; } } } }
Now that the files have been written to, I can close them up, and remove
the trash file

close INFILE; close OUTFILE; close TRASH; unlink $trash;
All in all, a very useful module.

Comment on Text::CSV
Select or Download Code
Re: Text::CSV
by swiftone (Curate) on Jan 11, 2001 at 02:34 UTC
    I to use Text::CSV quite happily. I spent forever and a day (actually, about 6 hours) trying to debug a problem with it, that was finally answered deep within the documentation. When it says:

    Allowable characters within a CSV field include 0x09 (tab) and the inclusive range of 0x20 (space) through 0x7E (tilde).

    It means it. In particular, if you get anything outside of this range, including any MS curly-quotes or (in my case) a single oddly out-of-range byte, it will quietly fail.

    Note that it does not accept newlines inside any field either.

      If you use Text::CSV_XS, you can handle those characters if you turn on the 'binary' option, like this:
      use Text::CSV_XS; $csv = Text::CSV->new({binary => 1}); ... $csv->parse($_); @fields = $csv->fields();
      Really nifty if you have to parse a CSV file with French text with accented characters and newlines in it...
        Just to be nit-picky, I think that is:
        $csv = Text::CSV_XS->new({binary => 1});
        Those that pointed out that this would resolve working with text that has accents in it - I love you. I was going absolutely batty trying to deal with this text that I needed to convert to XML.

        There are some odd things afoot now, in the Villa Straylight.
Re: Text::CSV
by TStanley (Canon) on Jan 13, 2001 at 03:25 UTC

    One problem that occured to me later on in this script, is that I didn't
    check to see if the $Outfile and $Junkfile existed. When I wrote this script,
    I had already touched those filenames, so I knew they existed, and didn't
    have to check for them. So I now add the following code:

    if(!-e $Outfile){ system("touch $Outfile"); } system("touch $Junkfile");

    In the end, there can be only one!
Re: Text::CSV
by dkubb (Deacon) on Jan 18, 2001 at 06:33 UTC

    This is an excellent module, but I prefer to use Text::CSV_XS.

    It's an XS based module, and is quite fast, definately the fastest perl module for manipulating quote-comma files on CPAN, that I have heard of.

    It has an identical interface to Text::CSV, meaning all would have to do a s/Text::CSV/Text::CSV_XS/g on the source file, and it should work the same.

      I had problems with both those modules when I tried them. The rules for escaping quotes and spaces within a CSV record are very clear, but the modules don't handle all the cases correctly. Every data file would die on some record.. that was legal, but which the modules wouldn't handle correctly. I ended up writing my own state-machine parser for CSV, first splitting the record into characters etc, just as I would do it in C.. it took me a couple of hours to code and test, but as I had already wasted many days messing with broken modules I considered it time well spent. -Ben M
Re (tilly) 1: Text::CSV
by tilly (Archbishop) on Apr 03, 2001 at 07:53 UTC
    I should get a CPAN ID, etc and put this there. But in the meantime Text::xSV is a plausible solution to this problem in pure Perl.

Back to Reviews

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: modulereview [id://51001]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (13)
As of 2014-07-10 16:52 GMT
Find Nodes?
    Voting Booth?

    When choosing user names for websites, I prefer to use:

    Results (213 votes), past polls