Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Writing a CSV Parser/Printer

by Anonymous Monk
on Jun 26, 2003 at 06:21 UTC ( [id://269118]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm attempting to write a script to parse csv files, and print them to a file. Unfortunately XS modules (including Text::CSV) are not an option at this point.

The CSV files have the following format:

$quoteChar $field $quoteChar $separationChar (no spaces) So for example: "Perlmonks", "http://www.perlmonks.org", "excellent ;)"

Entries are delimited by newlines.

So far, I have the following code:

#!/usr/bin/perl -w use strict; my $debug = 1; my $read_file = 'in.csv'; my $write_file = 'out.csv'; my $arrayref = parseCSV($read_file); for my $line (@{ $arrayref }) { for my $field (@{ $line }) { print "Field: $field\n"; } } printCSV($write_file, $arrayref); # parse a csv file into an array of arrays sub parseCSV { my $file_path = shift; my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my $inField = 1; my @data; # read csv file open DATA, $file_path or die("Couldn't read data file: $!"); while (<DATA>) { # remove newline chomp; # split into single chars my @chars = split('', $_); # store previous letter (for escape codes) my $previous = ''; my @fields; for my $c (@chars) { my $dataString; if (($c eq $quoteChar) && ($previous ne $escapeChar)) { if ($inField) { $inField = 0; next; } else { $inField = 1; next; } } if ($inField) { # ignore all in-field escape chars if ($c eq $escapeChar) { next; } # append char to data string $dataString = $dataString . $c } if ((! $inField) and ($c eq $separationChar)) { push(@fields, $dataString); } } push(@data, \@fields); } close DATA; # return a reference to an AoA return \@data; } # format and print an AoA to a CSV file sub printCSV { my $file_path = shift; my $entries = shift; # AoA ref containing entries my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my @data; for my $entry (@{$entries}) { my $entryString = ''; for my $field (@{ $entry }) { # escape all existing $quoteChars my $escapeQuote = $escapeChar . $quoteChar; $field = $field =~ s/$quoteChar/$escapeQuote/; # enclose in quoteChars $field = $quoteChar . $field . $quoteChar; debug("Field: $field"); # add on to $entryString $entryString = $entryString . $separationChar . $field; debug("Entry String: $entryString"); } # add a newline on the end $entryString = $entryString . "\n"; push(@data, $entryString); } # write @data to the file open DATA, ">$file_path" or die("Couldn't open $file_path: $!"); print DATA @data; close DATA; return; } sub debug { # write to log file instead of <STDOUT> my $message = shift; if ($debug) { print $message, "\n"; } }

The two main errors I'm getting right now are:

Use of uninitialized value in concatenation (.) or string at parseTest.pl line 16. and Use of uninitialized value in substitution (s///) at parseTest.pl line 113.

The out.csv files contains junk:

,"","","","" ,"" ,""

Any insights on how to improve the code would be greatly appreciated :)

Replies are listed 'Best First'.
Re: Writing a CSV Parser/Printer
by BrowserUk (Patriarch) on Jun 26, 2003 at 07:36 UTC

    Text::CSV is actually a pure text module. The only reason you can't install it in your local directory and use it is because it uses Autoloader to reduce its memory footprint.

    Rather than write your own replacement, you only need change 3 lines in the source of Text::CSV and you can use it, which would save some work and some testing.

    The three changes are:

    • Comment out line 23.

      #  use AutoLoader qw(AUTOLOAD);

    • Delete line 34.

      __END__

    • Move line 32 to line 319.

      1;

    With those changes the module takes a split second or two longer to load and uses a tad more memory, but works fine.

    Hope that helps.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      Excellent! It does work, thanks! :)

Re: Writing a CSV Parser/Printer
by tilly (Archbishop) on Jun 26, 2003 at 07:32 UTC
    If these CSV files are being produced by standard tools (eg saved from Microsoft programs), then you have the format wrong.

    The escape character for a double quote is actually a double quote. Also don't forget that the interior of a double-quoted string can contain embedded newlines. Furthermore not all fields are double-quoted.

      Thanks for the reminder, I forgot about the embedded newlines :).

Re: Writing a CSV Parser/Printer
by Skeeve (Parson) on Jun 26, 2003 at 06:35 UTC
    Quite complicated, your code ;-)

    Question: Why is Text::CSV no option?

    Question: Ever considdered to use RegEx for acomplishing your task instead of splitting to chars and building your string?

    Question: Could you please provide us with linenumbers? I can't reproduce your error-messages here

    Answer: One problem I see is your "my $datastring". You want to collect your data in that scalar but clear it each time through the loop. This way you won't ever get something usefull out of your loop.

      Quite complicated, your code ;-)

      Yep, that's the source of the problem ;)

      As for regexes - I'm not very good with them so I fell back on the c-style approach. I'm not sure how to add line numbers - can this be done though Perlmonks?

      As for Text::CSV - I can't install modules on the server (I can upload pure-perl ones though). I definately don't have a problem with using them for these type of tedious, error-prone endeavours. Any suggestions of alternatives are welcome.

      You're quite right about $dataString - I moved it out of the loop and it gets rid of the errors. The out.csv file is still just a bunch of quotes and commas.

      Here's the slightly modified code:

      Thanks for the help :)

        You have to provide us with line numbers. This can't be done on perlmonks. You can get them with:

        perl -pe '$_="$.: $_"' your_input > your_output
        I'm not sure how your desired output should look like. Maybe this will help you. It uses RegEx:

        use strict; use warnings; while (<DATA>) { my (@fields)= split /, /; foreach (@fields) { if (s/^"((?:[^"\\]|\\.)*)"$/$1/) { #correct tr/\\//d; # No more \ print "$_\n"; } } } __END__ "Perlmonks", "http://www.perlmonks.org", "excellent ;)" "csv", "csv\"xxx", "trall\ala"
        Short explanation for the RegEx:

        /^"((?:[^"\\]|\\.)*)"$/$1/

        ^"
        matches your field's quotechar at the start of the field
        (...)
        will "remember" what was matched inside the quotes
        (?:...)*
        This will match anything in place of the ... and tells the parser that it may apear as often as possible. Even zero times
        [^"\\]
        will match any character but " and \
        |
        is an alternative. Either the left or the right part has to match
        \\.
        Will match any "escaped" character
        "$
        again your quotechar but now at the end
Re: Writing a CSV Parser/Printer
by shotgunefx (Parson) on Jun 26, 2003 at 21:14 UTC
    Another option is Text::xSV

    -Lee

    "To be civilized is to deny one's nature."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://269118]
Approved by Skeeve
Front-paged by hsmyers
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-06-16 16:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.