Re: Converting CSV to tab-delimited
by Tux (Canon) on Apr 14, 2008 at 13:59 UTC
|
And you're sure that M$ doesn't export embedded new-lines, carriage-returns or other binary or special characters?
There is a very good reason for Text::CSV (and the undelying Text::CSV_XS and Text::CSV_PP) modules to be around, and installing isn't that hard.
cpan Text::CSV
use strict;
use warnings;
my $if = shift;
my ($of = $if) =~ s/\.csv$/.tab/ or die "usage: csv2tab file.csv";
open my $fh_i, "<", $if or die "$if: $!";
open my $fh_o, ">", $of or die "$of: $!";
my $csv = Text::CSV->new ({ binary => 1 });
my $tsv = Text::CSV->new ({ binary => 1, sep_char => "\t" });
while (my $row = $csv->getline ($fh_i)) {
$tsv->print ($fh_o, $row);
}
close $fh_i or die "$if: $!";
close $fh_o or die "$of: $!"
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
Strawberry perl comes with a shipload of useful bundled modules and a working cpan.bat. I've been playing with it over the weekend, and the only problems I had with installing new modules from CPAN is the SSL related modules and OS specific modules like BSDresource.
ActivePerl comes with ppm, which has most used modules available in a few keystrokes. Nothing is holding you from increasing your possibilities here. Try to imagine the time you will have to waste explaning the end-user why this oh so simple script suddenly stops working. I can assure you it is more than the time you need to convince him/her to install something good.
We've entered an era where updating or installing basic modules that have a proven value, is made very very easy, and will pay off over writing code, as simple as it may seem, that will provide you with headaches in the future.
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
|
And you're sure that M$ doesn't export embedded new-lines, carriage-returns or other binary or special characters?
Fyi, line breaks in cells, which are the most common case, are exported as an LF character whereas rows are separated by CRLF. (This of course might not apply to all versions of excel.)
| [reply] |
|
Sure, I know that installing Perl modules isn't that hard (once you've tracked down a make program for Windows - and once you've learned to avoid anything that needs a C compiler) but some people just don't want to know. All you have to do is accidentally say "non-standard" and a seemingly-sane business analyst starts to have nightmares about viruses or hackers. I have learned to avoid anything that doesn't come bundled with ActiveState Perl (and thank the g0ds, they've started including Term::ReadKey!!!)
Most people here are Unix hackers, believe me working with Windows - and habitual Windows users - is really, really frustrating.
And there are no newlines in the files - I know, it's our software that's creating them.
| [reply] |
|
Text::CVS can be installed without a C compiler, and even without resorting to the command line, via ActiveState's ppm application.
And there are no newlines in the files - I know, it's our software that's creating them.
I suggest amending the introduction to the code and mentioning this, in case someone would want to use the code you posted and isn't sure whether there are newlines in the code.
| [reply] |
|
|
All well and fine. I have a situation again where ROOTman (as we will call him) does not like anyone adding Perl modules to the production server. Though he's a Perlist himself, requests for additional modules are rebuffed as he cannot risk this machine having a hiccup. We also may be tied to an older release for quite some time, until the new hardware arrives and then we get whatever Perl comes on the RH Enterprise install.
So, a hand-wired CSV solution is sought by those of us not in a position to "simply ppm or CPAN Text::CSV into place". Good material is sparse - even the CookBook example isn't all that great. I did track down a regex which I have needed to follow up with several checks and edits to patch things up...
This then is a starting point (ugly/rough code):
my @inList = split /,(?!(?:[^",]|[^"],[^"])+")/;
# and further on a bit of a mess:
my @outList = ();
for (my $i=0; $i<$flds; $i++) {
if (! defined $inList[$i] ) {
$inList[$i] = "";
}
if ($inList[$i] =~ m/\D/) {
$inList[$i] = '"'.$inList[$i].'"';
}
$inList[$i] =~ s/^""/"/;
$inList[$i] =~ s/""$/"/;
$inList[$i] =~ s/^"$/""/;
push @outList, $inList[$i];
}
I eventually got to a point with my data that I simply sanitize all the crap in a field like ",", "'" and """ in self defense, straight after dealing with any nulls.
I hope this is useful for someone. | [reply] [d/l] |
|
Re: Converting CSV to tab-delimited
by ReedMeyer (Initiate) on May 24, 2010 at 19:31 UTC
|
Hello,
I just wanted to point out a bug in the original post by "PhilHibbs". That code mostly works, except for the handling of embedded double quotes. To fix this, change the line beginning with "if ($qv ..." to:
if (not $dq and $_ eq '"') { $dq=1; next; }
For convenience, here is a complete script that includes the bug fix and also writes the output to standard output, as opposed to writing to a file:
#!/usr/bin/perl
## Converts an Excel-style CSV-formatted file to TAB-separated format
## Note: doesn't handle newlines in quoted values
use strict;
use warnings;
while (<>) {
my $tab = "";
my $qv=0; # Quoted value indicator
my $dq=0; # Double quote flag indicates the previous character was
+a "
for (split //) {
# Start of a quoted value
if (not $qv and $_ eq '"') { $qv=1; next; }
# Double quotes within or at the end of a quoted value
if (not $dq and $_ eq '"') { $dq=1; next; }
# If last char was a double quotes OR we're not within a quoted
+value, comma = tab
if (($dq or not $qv) and $_ eq ',' ) { $dq=0; $qv=0; $_="\t"; }
+# End of field
# Two consecutive double-quote characters within a quoted value
elsif ($dq and $_ eq '"') { $dq=0; } # Double double quotes
$tab .= $_;
}
print $tab;
}
Cheers,
---Reed
| [reply] [d/l] |
Re: Converting CSV to tab-delimited
by Teva (Initiate) on Jun 03, 2008 at 11:24 UTC
|
Hi, I am trying to convert a text file with delimited data into a fully annotated XML SEPA (single Euro payment area) format.
I have never worked with XML before and would appreciate any help.
thanks, | [reply] |
|
- Create a new node in Seekers of Perl Wisdom
- Show an example of the source file
- Show an example of the wanted output (I've never heard of SEPA nor do I want to)
- Show us (what) you tried!
| [reply] |