Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Re: Writing a CSV Parser/Printer

by Anonymous Monk
on Jun 26, 2003 at 06:46 UTC ( [id://269124]=note: print w/replies, xml ) Need Help??


in reply to Re: Writing a CSV Parser/Printer
in thread Writing a CSV Parser/Printer

Quite complicated, your code ;-)

Yep, that's the source of the problem ;)

As for regexes - I'm not very good with them so I fell back on the c-style approach. I'm not sure how to add line numbers - can this be done though Perlmonks?

As for Text::CSV - I can't install modules on the server (I can upload pure-perl ones though). I definately don't have a problem with using them for these type of tedious, error-prone endeavours. Any suggestions of alternatives are welcome.

You're quite right about $dataString - I moved it out of the loop and it gets rid of the errors. The out.csv file is still just a bunch of quotes and commas.

Here's the slightly modified code:

#!/usr/bin/perl -w use strict; my $debug = 1; my $read_file = 'in.csv'; my $write_file = 'out.csv'; my $arrayref = parseCSV($read_file); for my $line (@{ $arrayref }) { for my $field (@{ $line }) { print "Field: $field\n"; } } printCSV($write_file, $arrayref); # parse a csv file into an array of arrays sub parseCSV { my $file_path = shift; my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my $inField = 1; my @data; # read csv file open DATA, $file_path or die("Couldn't read data file: $!"); while (<DATA>) { # remove newline chomp; # split into single chars my @chars = split('', $_); # store previous letter (for escape codes) my $previous = ''; my @fields; my $dataString; for my $c (@chars) { if (($c eq $quoteChar) && ($previous ne $escapeChar)) { if ($inField) { $inField = 0; next; } else { $inField = 1; next; } } if ($inField) { # ignore all in-field escape chars if ($c eq $escapeChar) { next; } # append char to data string $dataString = $dataString . $c } if ((! $inField) and ($c eq $separationChar)) { push(@fields, $dataString); $dataString = ''; } } push(@data, \@fields); } close DATA; # return a reference to an AoA return \@data; } # format and print an AoA to a CSV file sub printCSV { my $file_path = shift; my $entries = shift; # AoA ref containing entries my $separationChar = ','; my $quoteChar = '"'; my $escapeChar = '\\'; my @data; for my $entry (@{$entries}) { my $entryString = ''; for my $field (@{ $entry }) { # escape all existing $quoteChars my $escapeQuote = $escapeChar . $quoteChar; $field = $field =~ s/$quoteChar/$escapeQuote/; # enclose in quoteChars $field = $quoteChar . $field . $quoteChar; debug("Field: $field"); # add on to $entryString $entryString = $entryString . $separationChar . $field; debug("Entry String: $entryString"); } # add a newline on the end $entryString = $entryString . "\n"; push(@data, $entryString); } # write @data to the file open DATA, ">$file_path" or die("Couldn't open $file_path: $!"); print DATA @data; close DATA; return; } sub debug { # write to log file instead of <STDOUT> my $message = shift; if ($debug) { print $message, "\n"; } }

Thanks for the help :)

Replies are listed 'Best First'.
Re: Re: Re: Writing a CSV Parser/Printer
by Skeeve (Parson) on Jun 26, 2003 at 07:05 UTC
    You have to provide us with line numbers. This can't be done on perlmonks. You can get them with:

    perl -pe '$_="$.: $_"' your_input > your_output
    I'm not sure how your desired output should look like. Maybe this will help you. It uses RegEx:

    use strict; use warnings; while (<DATA>) { my (@fields)= split /, /; foreach (@fields) { if (s/^"((?:[^"\\]|\\.)*)"$/$1/) { #correct tr/\\//d; # No more \ print "$_\n"; } } } __END__ "Perlmonks", "http://www.perlmonks.org", "excellent ;)" "csv", "csv\"xxx", "trall\ala"
    Short explanation for the RegEx:

    /^"((?:[^"\\]|\\.)*)"$/$1/

    ^"
    matches your field's quotechar at the start of the field
    (...)
    will "remember" what was matched inside the quotes
    (?:...)*
    This will match anything in place of the ... and tells the parser that it may apear as often as possible. Even zero times
    [^"\\]
    will match any character but " and \
    |
    is an alternative. Either the left or the right part has to match
    \\.
    Will match any "escaped" character
    "$
    again your quotechar but now at the end

      Thanks for the excellent explanation :)

      With regards to the error line numbers: there aren't any errors anymore - it just doesn't produce the desired results. I have a feeling it's quite a ways away as well - your approach is far clearer.

      One question about the split if it's fed data like:

      "csv", "csv\"x, xx", "trall\ala"

      It will choke on the second entry. How would I go about avoiding this? I could use something like split/","/; which would make problems far less likely, but is there a better way? Some sort of notation for when it's inside the field?

        Okay. That one isn't easy ;-)

        I have to admit that I have no good solution at hand.

        Maybe a search here will help.

        BTW: Why don't you register here. It's far more easy to recognize you if you're no longer Anonymous Monk. It just costs you some time...

(jeffa) 3Re: Writing a CSV Parser/Printer
by jeffa (Bishop) on Jun 26, 2003 at 16:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://269124]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-06-20 21:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.