Converting CSV to tab-delimited

A colleague is having problems loading CSV (comma-separated variable length) data into Microsoft SSIS, so I told him I'd write a script to help him. I saw Text::CSV, but I don't want to have to go through module installation with him as he's a bit scared of Perl already. So I wrote this little script - and I didn't want to have to teach him about < and > on the command line, so the script automatically generates a file name based on the input with ".tab" on the end.

The rules are the standard Excel-style CSV rules, any embedded '"' characters get doubled up, and any value that contains a ',' character must have '"' characters delimiting the value, but other values don't need to have delimiters.

The code is a simple state machine processing one character at a time and storing two state variables based on whether an opening " has been detected and when a " has been encountered within a quoted value.

Shortcomings:

Doesn't handle newlines in quoted values

use strict;
use warnings;
# Note: doesn't handle newlines in quoted values

my $out = $ARGV[0].".tab";
open OUT,">$out" or die "Can't open output $out\n";
while (<>) {
   my $tab = "";
   my $qv=0; # Quoted value indicator
   my $dq=0; # Double quote flag indicates the previous character was 
+a "
   for (split //) {
      # Start of a quoted value
      if (not $qv and $_ eq '"') { $qv=1; next; }
      
      # Double quotes within or at the end of a quoted value
      if ($qv and $_ eq '"') { $dq=1; next; }

      # If last char was a double quotes OR we're not within a quoted 
+value, comma = tab
      if (($dq or not $qv) and $_ eq ',' ) { $dq=0; $qv=0; $_="\t"; } 
+# End of field

      # Two consecutive double-quote characters within a quoted value
      elsif ($dq and $_ eq '"') { $dq=0; } # Double double quotes
      $tab .= $_;
   }
   print OUT $tab;
}
[download]

Comment on Converting CSV to tab-delimited Download Code

Replies are listed 'Best First'.

Re: Converting CSV to tab-delimited
by Tux (Canon) on Apr 14, 2008 at 13:59 UTC

And you're sure that M$ doesn't export embedded new-lines, carriage-returns or other binary or special characters?

There is a very good reason for Text::CSV (and the undelying Text::CSV_XS and Text::CSV_PP) modules to be around, and installing isn't that hard.

cpan Text::CSV

use strict;
use warnings;
my $if = shift;
my ($of = $if) =~ s/\.csv$/.tab/ or die "usage: csv2tab file.csv";
open my $fh_i, "<", $if or die "$if: $!";
open my $fh_o, ">", $of or die "$of: $!";
my $csv = Text::CSV->new ({ binary => 1 });
my $tsv = Text::CSV->new ({ binary => 1, sep_char => "\t" });
while (my $row = $csv->getline ($fh_i)) {
    $tsv->print ($fh_o, $row);
    }
close $fh_i or die "$if: $!";
close $fh_o or die "$of: $!"
[download]

Enjoy, Have FUN! H.Merijn

[reply]
[d/l]
[select]

Re^2: Converting CSV to tab-delimited

by Tux (Canon) on Apr 14, 2008 at 14:35 UTC

Strawberry perl comes with a shipload of useful bundled modules and a working cpan.bat. I've been playing with it over the weekend, and the only problems I had with installing new modules from CPAN is the SSL related modules and OS specific modules like BSDresource.

ActivePerl comes with ppm, which has most used modules available in a few keystrokes. Nothing is holding you from increasing your possibilities here. Try to imagine the time you will have to waste explaning the end-user why this oh so simple script suddenly stops working. I can assure you it is more than the time you need to convince him/her to install something good.

We've entered an era where updating or installing basic modules that have a proven value, is made very very easy, and will pay off over writing code, as simple as it may seem, that will provide you with headaches in the future.

Enjoy, Have FUN! H.Merijn

[reply]
[d/l]

Re^2: Converting CSV to tab-delimited

by ambrus (Abbot) on Apr 15, 2008 at 14:49 UTC

And you're sure that M$ doesn't export embedded new-lines, carriage-returns or other binary or special characters?

Fyi, line breaks in cells, which are the most common case, are exported as an LF character whereas rows are separated by CRLF. (This of course might not apply to all versions of excel.)

[reply]

Re^2: Converting CSV to tab-delimited

by PhilHibbs (Hermit) on Apr 14, 2008 at 14:22 UTC

Most people here are Unix hackers, believe me working with Windows - and habitual Windows users - is really, really frustrating.

And there are no newlines in the files - I know, it's our software that's creating them.

[reply]

Re^3: Converting CSV to tab-delimited

by Erez (Priest) on Apr 14, 2008 at 17:45 UTC

Text::CVS can be installed without a C compiler, and even without resorting to the command line, via ActiveState's ppm application.

And there are no newlines in the files - I know, it's our software that's creating them.

I suggest amending the introduction to the code and mentioning this, in case someone would want to use the code you posted and isn't sure whether there are newlines in the code.

Software speaks in tongues of man.
Stop saying 'script'. Stop saying 'line-noise'.
We have nothing to lose but our metaphors.

[reply]

Re^4: Converting CSV to tab-delimited

by PhilHibbs (Hermit) on Jun 13, 2008 at 14:45 UTC

Re^3: Converting CSV to tab-delimited

by Anonymous Monk on Mar 16, 2009 at 22:40 UTC

So, a hand-wired CSV solution is sought by those of us not in a position to "simply ppm or CPAN Text::CSV into place". Good material is sparse - even the CookBook example isn't all that great. I did track down a regex which I have needed to follow up with several checks and edits to patch things up...

This then is a starting point (ugly/rough code):

my @inList = split /,(?!(?:[^",]|[^"],[^"])+")/;
# and further on a bit of a mess:
my @outList = ();
for (my $i=0; $i<$flds; $i++) {
  if (! defined $inList[$i] ) {
    $inList[$i] = ""; 
  }
  if ($inList[$i] =~ m/\D/) {
    $inList[$i] = '"'.$inList[$i].'"'; 
  }
  $inList[$i] =~ s/^""/"/;
  $inList[$i] =~ s/""$/"/;
  $inList[$i] =~ s/^"$/""/;
  push @outList, $inList[$i];
}
[download]

I eventually got to a point with my data that I simply sanitize all the crap in a field like ",", "'" and """ in self defense, straight after dealing with any nulls.

I hope this is useful for someone.

[reply]
[d/l]

Re^4: Converting CSV to tab-delimited

by Anonymous Monk on Mar 18, 2009 at 15:22 UTC

Re: Converting CSV to tab-delimited
by ReedMeyer (Initiate) on May 24, 2010 at 19:31 UTC

Hello,

I just wanted to point out a bug in the original post by "PhilHibbs". That code mostly works, except for the handling of embedded double quotes. To fix this, change the line beginning with "if ($qv ..." to:

if (not $dq and $_ eq '"') { $dq=1; next; }

For convenience, here is a complete script that includes the bug fix and also writes the output to standard output, as opposed to writing to a file:

#!/usr/bin/perl

## Converts an Excel-style CSV-formatted file to TAB-separated format
## Note: doesn't handle newlines in quoted values

use strict;
use warnings;

while (<>) {
   my $tab = "";
   my $qv=0; # Quoted value indicator
   my $dq=0; # Double quote flag indicates the previous character was 
+a "
   for (split //) {
      # Start of a quoted value
      if (not $qv and $_ eq '"') { $qv=1; next; }

      # Double quotes within or at the end of a quoted value
      if (not $dq and $_ eq '"') { $dq=1; next; }

      # If last char was a double quotes OR we're not within a quoted 
+value, comma = tab
      if (($dq or not $qv) and $_ eq ',' ) { $dq=0; $qv=0; $_="\t"; } 
+# End of field

      # Two consecutive double-quote characters within a quoted value
      elsif ($dq and $_ eq '"') { $dq=0; } # Double double quotes
      $tab .= $_;
   }
   print $tab;
}
[download]

Cheers,
---Reed

[reply]
[d/l]

Re: Converting CSV to tab-delimited
by Teva (Initiate) on Jun 03, 2008 at 11:24 UTC

Hi, I am trying to convert a text file with delimited data into a fully annotated XML SEPA (single Euro payment area) format. I have never worked with XML before and would appreciate any help. thanks,

[reply]

Re^2: Converting CSV to tab-delimited

by Jenda (Abbot) on Jun 03, 2008 at 13:56 UTC

Create a new node in Seekers of Perl Wisdom
Show an example of the source file
Show an example of the wanted output (I've never heard of SEPA nor do I want to)
Show us (what) you tried!

Jenda
Support Denmark!
Defend the free world!

[reply]

Back to Cool Uses for Perl