Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Re: Writing a CSV Parser/Printer

by Anonymous Monk
on Jun 26, 2003 at 06:46 UTC ( [id://269124]=note: print w/replies, xml ) Need Help??

in reply to Re: Writing a CSV Parser/Printer
in thread Writing a CSV Parser/Printer

Quite complicated, your code ;-)

Yep, that's the source of the problem ;)

As for regexes - I'm not very good with them so I fell back on the c-style approach. I'm not sure how to add line numbers - can this be done though Perlmonks?

As for Text::CSV - I can't install modules on the server (I can upload pure-perl ones though). I definately don't have a problem with using them for these type of tedious, error-prone endeavours. Any suggestions of alternatives are welcome.

You're quite right about $dataString - I moved it out of the loop and it gets rid of the errors. The out.csv file is still just a bunch of quotes and commas.

Here's the slightly modified code:

#!/usr/bin/perl -w use strict; my $debug = 1; my $read_file = 'in.csv'; my $write_file = 'out.csv'; my $arrayref = parseCSV($read_file); for my $line (@{ $arrayref }) { for my $field (@{ $line }) { print "Field: $field\n"; } } printCSV($write_file, $arrayref); # parse a csv file into an array of arrays sub parseCSV { my $file_path = shift; my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my $inField = 1; my @data; # read csv file open DATA, $file_path or die("Couldn't read data file: $!"); while (<DATA>) { # remove newline chomp; # split into single chars my @chars = split('', $_); # store previous letter (for escape codes) my $previous = ''; my @fields; my $dataString; for my $c (@chars) { if (($c eq $quoteChar) && ($previous ne $escapeChar)) { if ($inField) { $inField = 0; next; } else { $inField = 1; next; } } if ($inField) { # ignore all in-field escape chars if ($c eq $escapeChar) { next; } # append char to data string $dataString = $dataString . $c } if ((! $inField) and ($c eq $separationChar)) { push(@fields, $dataString); $dataString = ''; } } push(@data, \@fields); } close DATA; # return a reference to an AoA return \@data; } # format and print an AoA to a CSV file sub printCSV { my $file_path = shift; my $entries = shift; # AoA ref containing entries my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my @data; for my $entry (@{$entries}) { my $entryString = ''; for my $field (@{ $entry }) { # escape all existing $quoteChars my $escapeQuote = $escapeChar . $quoteChar; $field = $field =~ s/$quoteChar/$escapeQuote/; # enclose in quoteChars $field = $quoteChar . $field . $quoteChar; debug("Field: $field"); # add on to $entryString $entryString = $entryString . $separationChar . $field; debug("Entry String: $entryString"); } # add a newline on the end $entryString = $entryString . "\n"; push(@data, $entryString); } # write @data to the file open DATA, ">$file_path" or die("Couldn't open $file_path: $!"); print DATA @data; close DATA; return; } sub debug { # write to log file instead of <STDOUT> my $message = shift; if ($debug) { print $message, "\n"; } }

Thanks for the help :)

Replies are listed 'Best First'.
Re: Re: Re: Writing a CSV Parser/Printer
by Skeeve (Parson) on Jun 26, 2003 at 07:05 UTC
    You have to provide us with line numbers. This can't be done on perlmonks. You can get them with:

    perl -pe '$_="$.: $_"' your_input > your_output
    I'm not sure how your desired output should look like. Maybe this will help you. It uses RegEx:

    use strict; use warnings; while (<DATA>) { my (@fields)= split /, /; foreach (@fields) { if (s/^"((?:[^"\\]|\\.)*)"$/$1/) { #correct tr/\\//d; # No more \ print "$_\n"; } } } __END__ "Perlmonks", "", "excellent ;)" "csv", "csv\"xxx", "trall\ala"
    Short explanation for the RegEx:


    matches your field's quotechar at the start of the field
    will "remember" what was matched inside the quotes
    This will match anything in place of the ... and tells the parser that it may apear as often as possible. Even zero times
    will match any character but " and \
    is an alternative. Either the left or the right part has to match
    Will match any "escaped" character
    again your quotechar but now at the end

      Thanks for the excellent explanation :)

      With regards to the error line numbers: there aren't any errors anymore - it just doesn't produce the desired results. I have a feeling it's quite a ways away as well - your approach is far clearer.

      One question about the split if it's fed data like:

      "csv", "csv\"x, xx", "trall\ala"

      It will choke on the second entry. How would I go about avoiding this? I could use something like split/","/; which would make problems far less likely, but is there a better way? Some sort of notation for when it's inside the field?

        Okay. That one isn't easy ;-)

        I have to admit that I have no good solution at hand.

        Maybe a search here will help.

        BTW: Why don't you register here. It's far more easy to recognize you if you're no longer Anonymous Monk. It just costs you some time...

(jeffa) 3Re: Writing a CSV Parser/Printer
by jeffa (Bishop) on Jun 26, 2003 at 16:37 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://269124]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-06-20 21:06 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.