Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Low-threshold function in Text::CSV_XS

by Tux (Abbot)
on Jan 24, 2014 at 17:09 UTC ( #1071969=perlmeditation: print w/replies, xml ) Need Help??

I had quite a long discussion with Lady_Aleena in the chatterbox, as I wanted to fully understand her wishes. She had a number of complaints about Text::CSV (and Text::CSV_XS) being to complicated for simple end-user tasks.

We - as module authors - should always take remarks like that serious, even if the end user might not exactly be the target audience we had in mind when writing functionality.

<cbstream> [Lady_Aleena] My problem is that modules like Text::CSV_XS +doesn't open the files for me too. <cbstream> [Lady_Aleena] Tux, I might use Text::CSV if it becomes a on +e liner.

I've heard remarks like that before, but so far always ignored them, as the function/method to do so is so simple that including something like that in the module itself feels like bloat.

Now that I understand what Lady_Aleena actually wants with her data - to directly create a hash of hashes from a CSV-like file, I tried to come up with a far more generic function. Having rfc7111 fragments now, that only makes more sense :)

After a few iterations, I came up with a function that support all basic needs, yet still allows a lot of flexibility. Without going into the implementation, what I currently have done supports:

my $AoA = csv2list (file => "file.csv"); my $AoA = csv2list (data => $io, sep_char => "|"); my $AoA = csv2list (file => "file.txt", sep_char => "|", fragment => " +col=3;5-6;0-*"); my $AoH = csv2list (file => "file.csv", headers => "auto"); my $AoH = csv2list (data => $io, sep_char => "|", headers=> [ "Name", "Hobby", "Age" ]);

When I apply that to the code on Lady_Aleena's scratchpad, the difference would be somethink like

la.txt: Jan|Birdwatching|7 LA|coding|25 Tux|skiing|52 code: my @gdr = qw( Name Hobby Age ); my %la = lady_aleena (file => "la.txt", headings => \@hdr); my $aoh = csv2list (file => "la.txt", sep_char => "|", headers => \@hd +r); my %hoh = map { $_->{Name} => $_ } @{$aoh}; result for both: { Jan => { Age => 7, Hobby => 'Birdwatching', Name => 'Jan' }, LA => { Age => 25, Hobby => 'coding', Name => 'LA' }, Tux => { Age => 52, Hobby => 'skiing', Name => 'Tux' } }

I'll let this sink in a bit. Obviously a function like this has a lot of potential, but should it be auto-exported? And is it flexible enough as it is like this?

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re: Low-threshold function in Text::CSV_XS (@EXPORT_OK)
by tye (Sage) on Jan 24, 2014 at 18:01 UTC
    but should it be auto-exported?

    Put it in @EXPORT_OK not in @EXPORT. Don't encourage non-self-documenting imports. It isn't a burden to type use Text::CSV_XS 'csv2list'; and the clarity provided to the next person reading the code is well worth it.

    - tye        

      It isn't a burden to type use Text::CSV_XS 'csv2list';

      Except for when using one liners. This:

      perl -MText::CSV -E"csv2list(...)"

      Is nicer than:

      perl -MText::CSV=cvs2list -E"csv2list(...)"

      (But then, I'd name the function just csv(). )

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        See App::CCSV for oneliners ... although it could use some help with interface (it should come with existing configs)
Re: Low-threshold function in Text::CSV_XS
by moritz (Cardinal) on Jan 24, 2014 at 20:57 UTC

    Some assorted, unorganized feedback follows:

    I'm all for a versatile, easy-to-use function. But please don't call it csv2list when it doesn't actually return a list.

    There's already enough array vs. list confusion without you contributing even more to it :-).

    csv2array is only slightly longer.

    It would be really awesome if there would be corresponding array2csv function that accepts mostly the same arguments, and round-trip the data back to original CSV (or as close as possible).

    I would also love to have an option to have the first row treated as table headers (and thust column names for the hash variant), which would also imply that there must be some option to force the return format that isn't passing in a headers array reference.

      I am already converted to csv. I like your quest for the reverse trip. I have to ponder a bit if it can use the same function or if there really need to be two.

      headers => "auto" is exactly what you ask for in the last paragraph.

      Enjoy, Have FUN! H.Merijn
Re: Low-threshold function in Text::CSV_XS
by davido (Archbishop) on Jan 24, 2014 at 19:44 UTC

    Tie::CSV_File seems like a nice abstraction on CSV. The only thing lacking is it would be nice to be able to tell it that the top CSV row specifies column names, or to be able to supply a list of column names, and then get an AoH instead of the default AoA.


      Tie::CSV_File fills a niche, but does not lower the threshold for beginners. I even think that using tie is way outside of the problem I am trying to understand: to lower the barrier to use CSV *safely* and reliable without people complaining the API is too difficult.

      Using tie, I personally prefer to use Tie::Hash::DBD in combination with DBD::CSV, as I can then transparently switch to DBD::SQLite, DBD::Pg or any other relational database when CSV turns out to be too slow.

      Enjoy, Have FUN! H.Merijn
Re: Low-threshold function in Text::CSV_XS
by MidLifeXis (Monsignor) on Jan 24, 2014 at 18:36 UTC

    My only question when reading on the CB what what happens with duplicate keys. Perhaps a non-default mechanism to allow for it to emit an HoAoH. LA's case assumes (or ensures, take your pick) no duplicates, so it is a non-issue for her. Update: Quite right Tux. I skipped over the map line and took it as what was returned.

    My only other issue would be with the name. csv2list implies that it is returning a list. It is returning a hash (well, hashref), not a list.


      No, it returns an arrayref: a reference to a list of either anonymous hashes OR a list of anonymous arrays.

      And - I agree with tye and BrowserUK - if it not exported by default, the simple name csv would do quite well.

      As it is a list, duplicate keys are not a problem, unless you want a list of hashrefs and the fields contain duplicate names. In that case, the function provides an attribute to pass the field names yourself.

      Enjoy, Have FUN! H.Merijn

        Absolutely correct. E_READ_ERR. My post updated.


Re: Low-threshold function in Text::CSV_XS
by Tux (Abbot) on Jan 26, 2014 at 18:44 UTC

    Thanks for all the feedback so far. What I have created so far is this:

    Enjoy, Have FUN! H.Merijn
Re: Low-threshold function in Text::CSV_XS
by Jenda (Abbot) on Jan 30, 2014 at 11:41 UTC

    I'd rather like a function like this:

    my $CSV = openCsv (file => "file.csv", headers => "auto"); while (my $row = <$CSV>) { print "$row->{Foo} has $row->{Bar} bar(s)\n"; }

    There is nothing preventing READLINE() of a tied handle from returning an array or hash reference.

    Enoch was right!
    Enjoy the last years of Rome.

      I understand the request, but I see this syntax as too conflicting with the existing syntax.

      IF a function like that is to return an object of any kind, it would be a Text::CSV_XS object, more like this:

      my $csv = openCsv (file => "file.csv", headers => "auto"); while (my $row = $csv->getline) { # note the missing $io say $row->{Foo}; }

      This fits nicely in the rest of the code and documentation and I will consider something like this (no promises).

      Enjoy, Have FUN! H.Merijn
Re: Low-threshold function in Text::CSV_XS
by Tux (Abbot) on Feb 06, 2014 at 08:44 UTC

    I have just uploaded Text-CSV_XS-1.04, which has defined the new functionality like this:


    This function is not exported by default and should be explicitly requested:

     use Text::CSV_XS qw( csv );

    This is the first draft. This function will stay, but the arguments might change based on user feedback: esp. the headers attribute is not complete. The basics will stay.

    This is an high-level function that aims at simple interfaces. It can be used to read/parse a CSV file or stream (the default behavior) or to produce a file or write to a stream (define the out attribute). It returns an array reference on parsing (or undef on fail) or the numeric value of "error_diag" on writing. When this function fails you can get to the error using the class call to "error_diag"

    my $aoa = csv (in => "test.csv") or die Text::CSV_XS->error_diag;

    This function takes the arguments as key-value pairs. It can be passed as a list or as an anonymous hash:

    my $aoa = csv ( in => "test.csv", sep_char => ";"); my $aoh = csv ({ in => $fh, headers => "auto" });

    The arguments passed consist of two parts: the arguments to "csv" itself and the optional attributes to the CSV object used inside the function as enumerated and explained in "new".

    If not overridden, the default options used for CSV are

     auto_diag => 1

    These options are always set and cannot be altered

     binary    => 1


    Used to specify the source. in can be a file name (e.g. "file.csv"), which will be opened for reading and closed when finished, a file handle (e.g. $fh or FH), a reference to a glob (e.g. \*ARGV), or - when your version of perl is not archaic - the glob itself (e.g. *STDIN).

    When used with "out", it should be a reference to a CSV structure (AoA or AoH).

    my $aoa = csv (in => "file.csv"); open my $fh, "<", "file.csv"; my $aoa = csv (in => $fh); my $csv = [ [qw( Foo Bar )], [ 1, 2 ], [ 2, 3 ]]; my $err = csv (in => $csv, out => "file.csv");


    In output mode, the default CSV options when producing CSV are

     eol       => "\r\n"

    The "fragment" attribute is ignored in output mode.

    out can be a file name (e.g. "file.csv"), which will be opened for writing and closed when finished, a file handle (e.g. $fh or FH), a reference to a glob (e.g. \*STDOUT), or - when your version of perl is not archaic - the glob itself (e.g. *STDOUT).


    If passed, it should be an encoding accepted by the :encoding() option to open. There is no default value. This attribute does not work in perl 5.6.x.


    If this attribute is not given, the default behavior is to produce an array of arrays.

    If headers is given, it should be either an anonymous list of column names or a flag: auto or skip. When skip is used, the header will not be included in the output.

     my $aoa = csv (in => $fh, headers => "skip");

    If auto is used, the first line of the CSV source will be read as the list of field headers and used to produce an array of hashes.

     my $aoh = csv (in => $fh, headers => "auto");

    If headers is an anonymous list, it will be used instead

    my $aoh = csv (in => $fh, headers => [qw( Foo Bar )]); csv (in => $aoa, out => $fh, headers => [qw( code description price } +]);


    Only output the fragment as defined in the "fragment" method. This attribute is ignored when generating CSV. See "out".

    Combining all of them could give something like

    use Text::CSV_XS qw( csv ); my $aoh = csv ( in => "test.txt", encoding => "utf-8", headers => "auto", sep_char => "|", fragment => "row=3;6-9;15-*", ); say $aoh->[15]{Foo};

    Thanks for all the feedback I got!

    Enjoy, Have FUN! H.Merijn

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1071969]
Front-paged by Arunbear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2018-07-16 03:18 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (330 votes). Check out past polls.