http://www.perlmonks.org?node_id=415155

After some complaints about the interface of Text::CSV_XS in the chatterbox, I am thinking of adding a new set of methods. The new interface would default to accepting embedded newlines and would provide a number of convenient shortcuts. The old interface would operate exactly as it does now. I'd really appreciate any comments or suggestions you might have. Many thanks to tye,bart, and diotalevi for suggestions.
use Text::CSV_XS; $c = Text::CSV_XS->new; # use default separator, delimiter,escape or $c = Text::CSV_XS->new(%attr); # set your own separators, delimiters,escapes $c->open_file($filename) # open a CSV file $c->open_string($string) # open a CSV string @row = $c->fetchrow_array # fetch one row into an array $row = $c->fetchrow_hashref # fetch one row into a hashref $table = $c->fetchall_arrayref # fetch all rows into an array of arrays $table = $c->fetchall_hashref($key) # fetch all rows into a hashref $c->write_row( @array ) # insert a row from an array of values $c->write_table($filename,$arrayref) # create a CSV file from an arrayref $c->write_table($filename,$hashref) # create a CSV file from a hashref $c = open_file( $filename ); # loop through a file while(my $row = $c->fetchrow_hashref){ if($row->{$column_name} eq $value){ # do something } }

Replies are listed 'Best First'.
Re: Text::CSV_XS - proposed new interface
by freddo411 (Chaplain) on Dec 15, 2004 at 22:22 UTC
    This looks highly promising. I've been using Text::CSV_XS and I've had quite a problem figuring out how to parse my files when they had embeded EOLs in the quoted fields. I was able to do this, but not in an elegent way, rather in a kludgy way that had poor error handling.

    Your new interface seems like it might solve that rather nicely.

    Also, how about:

    $c->open_fh( \$filehandle) # use a CSV filehandle

    -------------------------------------
    Nothing is too wonderful to be true
    -- Michael Faraday

      Yes, thanks, I probably will support filehandles as well as filenames. As for embedded newlines, the module already does (and always has) supported them. Just set binary=>1 in the call to new().
        As for embedded newlines, the module already does (and always has) supported them. Just set binary=>1 in the call to new().
        Yes that's true, but my script is grabing and reading line by line, so that when I grab line one, I get data from the begining of the line up until the first EOL, which might be in a quoted field, not the true EOL.

        My kludge is to catch this error, append the next line to the first, and try again. Do this until the line parses correctly.

        That algorythm works out OK until there is a malformed data file. Now my program doesn't die until the whole file is thru an attempted read.

        I think the new interface will let me catch the error on a specific line read, which is a net win for me.

        Cheers

        -------------------------------------
        Nothing is too wonderful to be true
        -- Michael Faraday

Re: Text::CSV_XS - proposed new interface
by dragonchild (Archbishop) on Dec 15, 2004 at 19:10 UTC
    Instead of reworking Text::CSV_XS, what about providing an XS solution to Text::xSV. I'm pretty sure tilly wouldn't mind the help ...

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      Because Text::CSV_XS has been on CPAN 6 years and has many users. And because one of the most useful features that Text::xSV has over Text::CSV_XS, is that it's pure perl. Also, this is a very minor re-working of Text::CSV_XS - it already does all of these things but with an interface which some find confusing.
        The name Text::CSV_XS is a poor name and, IMHO, should be discarded in favor of an XS implementation of Text::xSV called Text::xSV::Fast. It could be included in the distribution and, if a C-compiler exists, should be compiled. Then, at install time, the user can choose if they wish to have the PurePerl version as the primary version or not.

        I never understood the completely artificial separation between Pure-Perl versions and XS versions, not when we have an extensible build system that can require input from the installing user.

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Text::CSV_XS - proposed new interface
by uwevoelker (Pilgrim) on Dec 16, 2004 at 11:43 UTC
    $row = $c->fetchrow_hashref
    Where do the column names come from? I would like the module to use the first line. Maybe we could implement a mapping (column_name => hash_key or column_name => '' to ignore this column).
    The old interface really is strange...

    Bye, Uwe
      > Where do the column names come from?

      Good question! My thought was that if the user specifies a comma-separated columns attribute in the new() flags, those will be the column names and the first row of the file will be treated as data but that if none is supplied, the first row of data will be treated as the column names.

      > The old interface really is strange

      Yep, though very usable once you get used to it. (I inherited the interface from the module's original author). The odd thing is that in the six years the module has been on CPAN, no one has mentioned this. Bart and Diotalevi griped loudly in the CB, so I listened. Griping++. (polite griping)++++. (sending messages to CPAN authors that they have strange interfaces with examples of what's strange and what might be better)++++++.

        > Where do the column names come from? Good question! My thought was that if the user specifies a comma-separated columns attribute in the new() flags, those will be the column names and the first row of the file will be treated as data but that if none is supplied, the first row of data will be treated as the column names.
        Think about this part very carefully. There are continual problems with DBD::CSV, because the column descriptions that work well in spreadsheets don't map directly to valid SQL column names. "Profit & Loss" works in a spreadsheet, but has to become something like "PROFIT_AND_LOSS" in a database.

        By the way, allowing embedded EOLs by default sounds good. It would also be good to allow extended characters by default. The first thing I tell everyone about Text::CSV_XS is that they need to turn "binary" on...

Re: Text::CSV_XS - proposed new interface
by Juerd (Abbot) on Dec 18, 2004 at 00:23 UTC

    Please don't copy DBI's interface. If you want that, you might as will just write DBD::CSV instead. Oh wait, that exists already, and it uses your module, even! How about that: a zillion interfaces ready to use! :)

    DBI's interface is inconsistent. It is in part because long time ago, lists were called arrays and references did not yet exist. fetchrow_array returns a list, not an array, but fetchrow_arrayref does return an arrayref, not a listref (listrefs do not exist (\@_ is an arrayref!)).

    Besides that, I think the fetch part is redundant. Well, for DBI it is, at least. Perhaps not for your module. All you do with an executed STH is fetch stuff. Another redundant part of the overly long method names is ref. Since it is impossible to return an array or hash, just array or hash is enough to indicate that a reference is returned. And the row versus all thing can be reduced to singular or plural.

    For DBIx::Simple, this is why fetchrow_arrayref is just array and fetchall_arrayref is just arrays (17 to 6, that's almost 66% less). As an extra handy feature, in list context, arrays returns a list of arrayrefs instead of a (reference to an) array of arrayrefs, so it can more easily be used with foreach. hash and hashes speak for themselves. I think list explains what it does (at least in list context) very well too.

    The best thing about these method names isn't that they are more consistent, or that they are more logical. It is that they require much less typing and because they are shorter, are also much easier to read (especially the difference between fetchall_someref and fetchrow_someref is hard to spot, because humans1 don't really pay much attention to what's in the middle).

    I'm not saying you should use what DBIx::Simple uses. I am asking you to reconsider your current DBI-ish method names. Yes, people already know DBI and that makes learning easier, but your module isn't DBI and DBI's method names aren't great even for DBI.

    (Add to this that you should replicate bugs and quirks that DBI has to not surprise people. If things look enough alike, people expect them to be exactly the same...)

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

    Update:
    1 - I'm not sure whether Java coders are human ;)