Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Comparison of the parsing features of CSV (and xSV) modules

by Wally Hartshorn (Hermit)
on Jun 15, 2004 at 14:12 UTC ( [id://366898]=note: print w/replies, xml ) Need Help??


in reply to Comparison of the parsing features of CSV (and xSV) modules

Great post! I obviously have an interest in this given my recent self-inflicted problems with processing a CSV file. :-)

About the only thing missing from most (all?) of these modules is a way to handle embedded, unescaped delimiter characters. (Wouldn't clean input data be nice? *sigh*) Perhaps setting the delimiter character to the empty string would trigger a separate set of logic that would handle that case. (A delimiter character followed by a separator character or a newline would be a real closing delimiter character, while others would be ignored, perhaps.)

Not that I'm volunteering, of course. :-)

Wally Hartshorn

Replies are listed 'Best First'.
Re^2: Comparison of the parsing features of CSV (and xSV) modules
by dragonchild (Archbishop) on Jun 15, 2004 at 14:17 UTC
    What would be some example data, how it's currently being parsed, and how you'd like it to be parsed?

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Here's an example:

      "Smith","John",12/31/1962,"Author of "How to Break Programs" and other books","Bugger"

      I'm using a series of (somewhat fragile) regexes to change that to:

      "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and other books","Bugger"

      Wally Hartshorn

        There are, of course, going to be boundary cases that don't work as expected as soon as you start playing with allowing undoubled double-quotes inside of a format that expects them doubled. However Text::xSV allows you to define arbitrary filters that it preprocesses text with, and should do a reasonable job on the above with the following filter:
        sub { my $line = shift; $line =~ s/\r$//; $line =~ s/"(.)/""$1/g; $line =~ s/"?,"?/,/g; return $line; }
        Yes, there is some fragility, but it should be at least moderately hard to trigger.
        And, what should the parser do with the following:
        "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger"

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://366898]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-03-28 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found