One-line CSV Parser

by BiffBaker (Novice)
on Apr 03, 2008 at 19:47 UTC ( #678257=snippet: print w/replies, xml ) Need Help??
Description: How is this for quick parsing of standard CSV?
map {if (/^".*"$/) {s/\s*"\s*//g; push @fields, $_;} else {push @field
+s, split /\s*,\s*/;}} split /,*("[^"]*"),*/, $line;
Re: One-line CSV Parser
by duelafn (Vicar) on Apr 03, 2008 at 23:36 UTC

    No escaping:

    $line = 'bob,"I said \"foo\"",bar';

    Could be shorter (matches identically):

    @fields = map { /^"\s*(.*?)\s*"$/ ? $1 : split /\s*,\s*/ } split /,*("[^"]*"),*/, $line

    Update: This too:

    $line = 'bob,I said "foo",bar'

    I originally had the following, which shows a trick for putting multiple things in a ?: operation:

    @fields = map { /^".*"$/ ? do{s/\s*"\s*//g; $_} : split(/\s*,\s*/) } split /,*("[^"]*"),*/, $line

    Update 2: Hmm, that's what I get for not reading the RFC, non-counterexamples (counter-counterexamples?) (see other posts below)

    Good Day,

Re: One-line CSV Parser
by idsfa (Vicar) on Apr 04, 2008 at 14:10 UTC

    CSV as defined by RFC 4180 does not "escape" double quotes with a backslash, but rather by an additional set of double quotes. Your parser fails to handle this format properly.

    CSV is hard.

Re: One-line CSV Parser
by radiantmatrix (Parson) on Apr 08, 2008 at 16:16 UTC

    I much prefer

    use Text::CSV_XS; use IO::File; my $io = IO::File->new( $filename, '<' ) or die "Can't read $filename: + $!"; my $csv = Text::CSV_XS->new(); until ( $io->eof ) { my $row = $csv->getline($io)); # do something with the ARRAYref $row }

    It's not much longer, but provides all kinds of error handling, is easier to maintain (and easier to read), and handles all the ins and outs of escaping, etc. As idsfa says, "CSV is hard".

    Not only that, Text::CSV_XS is really fast to boot.

    Don't do stuff yourself that others have already done and tested thoroughly -- this is true Laziness.

Re: One-line CSV Parser
by ww (Archbishop) on Apr 04, 2008 at 16:01 UTC

    From the cited RFC,

    If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:



    ...from the preceeding para of the RFC ( ):

    If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

    Note, however that "as defined" may overstate the status of the document:

    This memo provides information for the Internet community. It does not specify an Internet standard of any kind.
    ... While there are various specifications and implementations for the CSV format (cites removed), there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files. This section documents the format that seems to be followed by most implementations:
