|
Item Description: Manipulation routines for comma-separated values
Review Synopsis:
Author: Alan Citterman
I had a project where I needed to extract data from a file and send
it to a customer. The file in question was from a database, and it
had been exported to a CSV text file.
I would have tried to write my own regular expression to handle this,
but my overall knowledge of Perl isn't that good. However, after some
research, I found a reference to this module.
#!/usr/bin/perl
use strict;
use Text::CSV;
I knew that the text file had lines of data that I didn't need, and
that there was an easily recognizable pattern in those lines, so I could
use a regular expression to put those lines into a trash file.
my $input="input.csv";
my $output="output.txt";
my $trash="trashfile";
my $csv=Text::CSV->new(); #Creates a new Text::CSV object
open(INFILE,$input) || die "Can't open file $input";
open(OUTFILE,">$output") || die "Can't open file $output";
open(TRASH,">$trash") || die "Can't open file $trash";
Now to start reading the data from the file, store it in the $_ variable
and print it to the trash file if its not good, or parse the variable, and
print it to the output file if it is.
while (<INFILE>) {
if (/"X"/) { #The trash data has these 3 characters in it
print TRASH "$_\n";
}
else { #Now to deal with the data I want to keep
if($csv->parse($_)) { #checks to see if data exists in $_ and
+parses it if it does
my @fields=$csv->fields; # puts the values from each field in an
+array
my $elements=@fields; #gets the number of elements in the arra
+y
for ($x=0;$x<$elements;$x++) {
print OUTFILE "$fields[$x]\t";
}
}
}
}
Now that the files have been written to, I can close them up, and remove
the trash file
close INFILE;
close OUTFILE;
close TRASH;
unlink $trash;
All in all, a very useful module.
Re: Text::CSV by swiftone (Curate) on Jan 11, 2001 at 02:34 UTC |
I to use Text::CSV quite happily. I spent forever and a day (actually, about 6 hours) trying to debug a problem with it, that was finally answered deep within the documentation. When it says:
Allowable characters within a CSV field include 0x09
(tab) and the inclusive range of 0x20 (space) through
0x7E (tilde).
It means it. In particular, if you get anything outside of this range, including any MS curly-quotes or (in my case) a single oddly out-of-range byte, it will quietly fail.
Note that it does not accept newlines inside any field either.
| [reply] |
|
If you use Text::CSV_XS, you can handle those characters if you turn on the 'binary' option, like this:
use Text::CSV_XS;
$csv = Text::CSV->new({binary => 1});
...
$csv->parse($_);
@fields = $csv->fields();
Really nifty if you have to parse a CSV file with French text with accented characters and newlines in it...
| [reply] [d/l] |
|
Just to be nit-picky, I think that is:
$csv = Text::CSV_XS->new({binary => 1});
Those that pointed out that this would resolve working with text that has accents in it - I love you. I was going absolutely batty trying to deal with this text that I needed to convert to XML.
-------------------------------------------------------------------
There are some odd things afoot now, in the Villa Straylight.
| [reply] [d/l] |
Re: Text::CSV by TStanley (Canon) on Jan 13, 2001 at 03:25 UTC |
UPDATE:
One problem that occured to me later on in this script, is that I didn't
check to see if the $Outfile and $Junkfile existed. When I wrote this script,
I had already touched those filenames, so I knew they existed, and didn't
have to check for them. So I now add the following code:
if(!-e $Outfile){
system("touch $Outfile");
}
system("touch $Junkfile");
TStanley
In the end, there can be only one!
| [reply] [d/l] |
Re: Text::CSV by dkubb (Deacon) on Jan 18, 2001 at 06:33 UTC |
This is an excellent module, but I prefer to use Text::CSV_XS.
It's an XS based module, and is quite fast, definately the fastest perl module for manipulating quote-comma files on CPAN, that I have heard of.
It has an identical interface to Text::CSV, meaning all would have to do a s/Text::CSV/Text::CSV_XS/g on the source file, and it should work the same.
| [reply] [d/l] |
|
I had problems with both those modules when I tried them. The rules for escaping quotes and spaces within a CSV record are very clear, but the modules don't handle all the cases correctly. Every data file would die on some record.. that was legal, but which the modules wouldn't handle correctly. I ended up writing my own state-machine parser for CSV, first splitting the record into characters etc, just as I would do it in C.. it took me a couple of hours to code and test, but as I had already wasted many days messing with broken modules I considered it time well spent.
-Ben M
| [reply] |
Re (tilly) 1: Text::CSV by tilly (Archbishop) on Apr 03, 2001 at 07:53 UTC |
I should get a CPAN ID, etc and put this there. But in the
meantime Text::xSV is a plausible solution to this
problem in pure Perl. | [reply] |
Back to Reviews
|