Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Perl-Sensitive Sunglasses
 
PerlMonks  

The future of Text::CSV_XS - TODO

by Tux (Monsignor)
on May 25, 2007 at 19:06 UTC ( #617577=perlmeditation: print w/ replies, xml ) Need Help??

This is a follow-up on my previous post on Text::CSV_XS

I have been digging a bit to find what people consider loose ends in Text::CSV_XS, and tried to summarize that (in no particular order) in the new TODO list. Here TODO gives no guarantee that it will be done, nor on any implementation or API that it might suggest, it is there now just so I/we do not forget to think about these issues.

I'd like to get thoughts/feedback/suggestions about items on this list, and how valuable you consider adding these features to a modules so heavily used by other applications.

jZed asked me to also post to this to the dbi-users list, cause many DBI users (have to) deal with CSV data, which I already did. So start shooting ...

TODO (updated to 0.80 on 25-12-2010)

More Errors & Warnings

New extensions ought to be clear and concise in reporting what error occurred where and why, and possibly also tell a remedy to the problem. error_diag is a (very) good start, but there is more work to be done here.

Basic calls should croak or warn on illegal parameters. Errors should be documented.

setting meta info

Future extensions might include extending the meta_info (), is_quoted (), and is_binary () to accept setting these flags for fields, so you can specify which fields are quoted in the combine ()/string () combination.

  $csv->meta_info (0, 1, 1, 3, 0, 0);
  $csv->is_quoted (3, 1);
combined methods

Requests for adding means (methods) that combine combine () and string () in a single call will not be honored. Likewise for parse () and fields (). Given the trouble with embedded newlines, Using getline () and print () instead is the preferred way to go.

Parse the whole file at once

Implement new methods that enable parsing of a complete file at once, returning a list of hashes. Possible extension to this could be to enable a column selection on the call:

   my @AoH = $csv->parse_file ($filename, { cols =>  1, 4..8, 12 });

Returning something like

   [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
       flags  => [ ... ],
       errors => [ ... ],
       },
     { fields => [ ... ],
       .
       .
       },
     ]

Note that getline_all () already returns all rows for an open stream, but this will not return flags.

EBCDIC

The hard-coding of characters and character ranges makes this module unusable on EBCDIC system. Using some #ifdef structure could enable these again without loosing speed. Testing would be the hard part.


The most current state is available on the public GIT repo.


Enjoy, Have FUN! H.Merijn

Comment on The future of Text::CSV_XS - TODO
Select or Download Code
Re: The future of Text::CSV_XS - TODO
by tfrayner (Curate) on May 28, 2007 at 22:57 UTC
    Hi,

    I think that making the eol option actually honour $/ would be extremely useful, and I think your suggested API for that would work fine. As I understand it (and the current behaviour of Text::CSV_XS certainly seems to bear me out) the module only handles \015\012 and \012 as CSV line endings for parsing input. It would be tremendously helpful if \015 was also supported. I'm assuming it's not, since I've never been able to get it to work despite occasional attempts over the years. There's also a note in the POD (CAVEATS) suggesting that alternate line endings are unsupported. I'm often surprised at the number of people out there still using spreadsheet software that generates CSV files with \015 line endings. I think older Mac OS machines, and some ill-behaved Mac OSX applications are at fault (if fault it really is).

    Cheers,

    Tim

    Update: based on the principle that every bug report should at least take a stab at giving a test case, here's what I'm talking about:

    perl -MText::CSV_XS -MIO::File -e '$/="\r"; $f = IO::File->new_tmpfile +; print $f "a,b,c$/"; seek($f,0,0); $c = Text::CSV_XS->new({eol => $/ +}); print join("|",@{ $c->getline($f) })'
    That should output "a|b|c", and indeed does if $/ is set to \n or \r\n. Just clearing up any possible misunderstandings :-)

      A test case calls for proof :)

      use strict; use warnings; use Text::CSV_XS; $/ = "\r"; print "Testing with version $Text::CSV_XS::VERSION\n"; my $csv = Text::CSV_XS->new ({ eol => $/ }); my $str = "a,b,c$/"; open my $fh, "<", \$str or die "ScalarIO: $!\n"; if (my $row = $csv->getline ($fh)) { print join "|", @$row, "\n"; } else { $csv->error_diag; }

      Generates:

      $ perl eoltest.pl Testing with version 0.58 a|b|c|

      Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on May 30, 2007 at 08:17 UTC

    Small updates

    I now understand the fuzz people make about embedded newlines. Text::CSV_XS has always been able to deal with that (well, maybe not always, but at least for a long time already). The problem that people might have is reading the line in the perl script. Obviously,

    my $csv = Text::CSV_XS-New ({ binary => 1 }); while (<>) { $csv->parse ($_); my @fields = $csv->fields ();

    Will horribly fail, as <> will break too early.

    The most recent snapshot now contains a t/45_eol.t, that tests all possible eol combinations. Have a look to see that the way to parse CSV with embedded newlines should be done similar to:

    use IO::Handle; my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); while (my $row = $csv->getline (*ARGV)) { my @fields = @$row;

    or, if you open files yourself, like:

    my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row;

    I'm still thinking about the best way to add this to the docs.


    Enjoy, Have FUN! H.Merijn
      Thanks very much for looking at this. I've tried 0.27, and it shows the same problem (see my original node, updated, for a test of sorts). I looked through your eol tests, and checked on exactly what was being written out to the temporary file in the \r cases. It appears that in those cases the file is terminated with a \r\n, rather than just \r. I think this may be why your tests pass but mine doesn't?

      Cheers,

      Tim

        New snapshot just uploaded, in which eol => $/ is permitted for "\r". That extends successful parsing to line endings in the set undef, "\n", "\r\n", and "\r".


        Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by snoopy (Deacon) on Jun 01, 2007 at 02:12 UTC
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jun 01, 2007 at 07:00 UTC

    Thanks for the links. I'll scan them.

    Meanwhile Text::CSV_XS 0.27 has left the building and should be available to all when CPAN synced your favourite mirror.

    2007-05-31 0.27 - H.Merijn Brand * checked with perlcritic (still works under 5.00504) so 3-arg open cannot be used (except in the docs) * 3-arg open in docs too * Added a lot to the TODO list * Some more info on using escape character (jZed) * Mention Text::CSV_PP in README * Added t/45_eol.t, eol tests * Added a section about embedded newlines in the pod * Allow \r as eol ($/) for parsing * More docs for eol * More eol = \r fixes, tfrayner's test case added to t/45_eol. +t

    Work already progresses for the next snapshot that is to include ways to allow buggy formats

    2007-06-01 0.28 - H.Merijn Brand * Added allow_loose_quotes (see doc) * Added t/65_allow.t * Added allow_loose_escapes (see doc) RT 15076 =item allow_loose_quotes By default, parsing fields that have C<quote_char> characters inside an unquoted field, like 1,foo "bar" baz,42 would result in a parse error. Though it is still bad practice to allow this format, we cannot help there are some vendors that make their applications spit out lines styled like this. =item allow_loose_escapes By default, parsing fields that have C<escapee_char> characters that escape characters that do not need to be escaped, like: my $csv = Text::CSV_XS->new ({ esc_char => "\\" }); $csv->parse (qq{1,"my bar\'s",baz,42}); would result in a parse error. Though it is still bad practice to allow this format, this option enables you to treat all escape charact +er sequences equal.

    I'm playing to see if allow_whitespace is also feasable


    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jun 18, 2007 at 14:33 UTC

    Thanks to all the feedback from all the different corners of the perl community, I'm doing far more than I planned :)

    Feel free to comment, request features, or give feedback on whatever part of this module you want. Meanwhile, 0.30 will leave for the wild wide world soon ...

    2007-06-18 0.30 - H.Merijn Brand * ,\rx, is definitely an error without binary (used to HANG!) * Fixed bug in attribute caching for undefined eol * Cleaned up some code after -W*** warnings * Added chomp_verbatim. HIGHLY EXPERIMENTAL! * More test to cover the really dark corners and edge cases * Even more typo fixes in the docs * Added error_diag () * Added t/80_diag.t * Added DIAGNOSTICS section to pod 2007-06-08 0.29 - H.Merijn Brand * Removed an unused 'use Data::Dumper' * Added $csv->eof () RT 27424 * Two typo's in the doc's (Amsterdam.pm) * Modified examples/speed.pl to better show the diffs between +versions * Cache attribute settings and regain speed of 0.23! and beyon +d Relative overall speeds (YMMV, use examples/speed.pl to chec +k), the 1.0x versions are from Text::CSV_PP. 0.23 0.25 0.26 0.27 0.28 0.29 1.00 1.02 1.05 ==== ==== ==== ==== ==== ==== ==== ==== ==== combine 1 62 61 61 60 58 100 14 14 14 combine 10 41 41 41 42 42 100 6 6 6 combine 100 35 35 36 36 36 100 5 5 5 parse 1 100 92 87 90 81 96 19 19 17 parse 10 95 100 86 97 94 94 15 16 14 parse 100 90 95 84 94 92 100 16 16 14 print io 93 89 91 91 95 100 0 0 6 getline io 90 92 84 87 89 100 0 0 13 ---- ---- ---- ---- ---- ---- ---- ---- ---- average 75 75 71 74 73 98 9 9 11 * Removed prototypes * Added a SPECIFICATION section to the pod * Safe caching of eol, and a warning in the docs * Rewrote t/20_file.t do do actual file IO instead of IO_Scala +r fake * Small optimization for parse (juerd) * Added make target test_speed * Merged the items from CAVEAT to SPECIFICATION * Updated the TODO and Release Plan sections * Speed up internals by using references instead of copies (ju +erd) * Reworked error_input, which also had an undetected internal +error * Added IO tests for lexical IO handles and perlio IO to/from +scalars

    The release candidate is up for you to test.


    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 24, 2007 at 12:24 UTC
    2007-10-24 0.32 - H.Merijn Brand * Added $csv->error_diag () to SYNOPSIS * Added need for diag when new () fails to TODO * Fixed a sneaked-in defined or in examples/csv2xls * Plugged a 32byte memory leak in the cache code (valgrind++) * Some perlcritic level1 changes 2007-07-23 0.31 - H.Merijn Brand * Removed prototypes in examples/csv2xls * Improved usage for examples/csv2xls (GetOpt::Long now does - +-help/-?) * Extended examples/csv2xls to deal with Unicode (-u) * Serious bug in Text::CSV_XS::NV () type setting, causing the resulting field to be truncated to IV

    Enjoy, Have FUN! H.Merijn

      And a new speed comparison table. 0.23 .. 0.32 are Text::CSV_XS, and 1.00 .. 1.06 are Text::CSV_PP. I have no idea why 1.06 is not yet on CPAN.

      0.23 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 1.00 1.02 1.0 +5 1.06 ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== === += ==== combine 1 60 60 60 59 56 100 98 93 96 14 14 1 +4 14 combine 10 40 40 41 41 40 100 96 92 95 6 6 +6 6 combine 100 34 33 34 33 34 100 92 89 90 5 5 +5 5 parse 1 100 86 92 89 86 96 98 93 98 20 19 1 +7 8 parse 10 94 85 98 98 100 96 92 87 94 16 16 1 +4 5 parse 100 88 81 93 90 97 100 97 93 99 16 16 1 +4 5 print io 92 90 91 91 92 100 95 94 94 68 66 +6 6 getline io 88 80 90 88 90 100 98 93 98 - - 1 +3 5 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- --- +- ---- average 74 69 74 73 74 99 95 91 95 18 17 1 +1 6

      Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by dipster (Novice) on Oct 30, 2007 at 08:03 UTC

    Firstly, what a great module, thanks! I've always found it reliable and effective.

    It would be handy if string() returned the string read by getline().

    Although this does seem to work...

    use Text::CSV_XS; use IO::File; # tell/seek fails if i omit this line my $file = "1,2,3\n4,5,6\n7,8,9\n10,11,12\n13,14,15"; open (my $io, '<', \$file); # Open filehandle to in memory file my $csv = Text::CSV_XS->new(); while (1) { my $start = tell($io); my $colref = $csv->getline($io); last unless $colref; print "Array: @$colref\n"; my $length = tell($io) - $start; seek($io, $start, 0); read($io, my $string, $length); print "String: <$string>\n"; }

    What do you think? Would it be possible for Text::CSV_XS to return the string more efficiently?

    I've also wondered whether eol would always be limited to a finite set of predefined strings?

    Thanks again.
    Steve

      That could be done, but not by default, as this has a big impact on performance, as now every getline () result has to be copied/stored in a location that string () can return to the user.

      The solution you post is only viable if the input stream is seek'able, and will not work on pipes. This is one of the reasons why this module is not capable of determining the line characteristics on the fly.


      Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Mar 01, 2008 at 17:02 UTC
    2008-03-01 0.35 (Valloire) - H.Merijn Brand * use warnings/$^W = 1 * Tested on 5.10.0, 5.8.8, 5.6.2, and 5.005_04, Strawberry and + Cygwin * Diagnostics for failed new () * New 'blank_is_undef' option * Updated docs * Extended the DIAGNOSTICS a bit * Updated TODO * Fixed allow_whitespace issue, revealed by blank_is_undef * Re-enabled some tests * Fixed parse error that passed for q{1, "bar",2} with escape_ +char + * Reversed an erroneous test result in the funny combo section * Extended diagnostics tests * Extended XS coverage * Removed error 2033

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Mar 06, 2008 at 08:13 UTC

    I finally scratched a longstanding personal itch. No more need for 'use IO::Handle;'. Text::CSV_XS will now do that for you automatically at the moment it is needed, and only once, so no performance penalty. This will simplify one-liners.

    2008-03-06 0.36 - H.Merijn Brand * Updated ppport.h * auto-load IO::Handle when needed * Tested with all available perl builds, including a fresh threaded 5.11 (blead)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Mar 11, 2008 at 18:15 UTC
    2008-03-11 0.37 - H.Merijn Brand * Copied GIT repo to public mirror * Fix leak / make meta info available to getline () + tests
    Recent changes can be (re)viewed in the public GIT repository at git://repo.or.cz/Text-CSV_XS.git

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Apr 07, 2008 at 13:05 UTC
    file: $CPAN/authors/id/H/HM/HMBRAND/Text-CSV_XS-0.40.tgz size: 85057 bytes md5: cb8b2af20925b832159f34eed9793666 2008-04-07 0.40 - H.Merijn Brand * Implemented getline_hr () and column_names () RT 34474 (suggestions accepted from Mark Stosberg) * Corrected misspelled variable names in XS * Functions are now =head2 type doc entries (Mark Stosberg) * Make SetDiag() available to the perl level, so errors can be centralized and consistent * Integrate the non-XS errors into XS * Add t/75_hashref.t * Testcase for error 2023 (Michael P Randall) * Completely refactored the XS part of parse/getline, which is now up to 6% faster. YMMV * Completed bind_columns. On straight fetches now up to three times as fast as normal fetches (both using getline ()) getline_hr The "getline_hr ()" and "column_names ()" methods work together + to allow you to have rows returned as hashrefs. You must call "column_names ()" first to declare your column names. $csv->column_names (qw( code name price description )); $hr = $csv->getline_hr ($io); print "Price for $hr->{name} is $hr->{price} EUR\n"; "getline_hr ()" will croak if called before "column_names ()". column_names Set the keys that will be used in the "getline_hr ()" calls. If + no keys (column names) are passed, it'll return the current setting. "column_names ()" accepts a list of scalars (the column names) +or a single array_ref, so you can pass "getline ()" $csv->column_names ($csv->getline ($io)); "column_names ()" croaks on invalid arguments. bind_columns Takes a list of references to scalars (max 255) to store the fi +elds fetched "by getline_hr ()" in. When you don't pass enough refer +ences to store the fetched fields in, "getline ()" will fail. If you pas +s more than there are fields to return, the remaining references are l +eft untouched. $csv->bind_columns (\$code, \$name, \$price, \$description); while ($csv->getline ()) { print "The price of a $name is \x{20ac} $price\n"; }

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Apr 11, 2008 at 10:27 UTC
    file: $CPAN/authors/id/H/HM/HMBRAND/Text-CSV_XS-0.41.tgz size: 85570 bytes md5: f704fb8ad057a36e7cc0fb892c0f940a 2008-04-11 0.41 - H.Merijn Brand * error_diag () subclassable * typo in bind_columns () docs * examples/csv2xls now uses getline () * better test for getline in t/75_hashref.t (makamata) * document return value of getline () with bind_columns () * add perl version prereq to META.yml

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Apr 16, 2008 at 14:23 UTC
    file: $CPAN/authors/id/H/HM/HMBRAND/Text-CSV_XS-0.42.tgz size: 86136 bytes md5: 1cf4491f48965793f1e31fc74159f20f We can do MAGIC now! Dumping the content of a database ($dbh) table ($tbl) to CSV: my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); open my $fh, ">", "$tbl.csv" or die "$tbl.csv: $!"; my $sth = $dbh->prepare ("select * from $tbl"); $sth->execute; $csv->print ($fh, $sth->{NAME_lc}); while (my $row = $sth->fetch) { $csv->print ($fh, $row); } close $fh; 2008-04-16 0.42 - H.Merijn Brand * Generate META.yml myself. I won't use Build.PL * Array-refs now accept scalars with magic: $csv->print (*STDOUT, $sth->{NAME_lc}); * More/better examples * Added t/76_magic.t

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Apr 23, 2008 at 11:11 UTC
    2008-04-21 0.43 - H.Merijn Brand * parse errors try to remember failing position * used valgrind to test for leaks (devel-only) * used Test::Valgrind as alternative leak check (devel-only) * improve documentation for error 2023 * nailed the loose quotes in quoted fields

    The upcoming 0.44 will have:

    * Fixed the error position returned as third arg in error_diag + () * Made examples/csv-check use this even more vebose * Removed double-double quote from TODO

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jun 21, 2008 at 12:51 UTC
    2008-06-17 0.51 - H.Merijn Brand * Allow UTF8 even without binary => 1 * Fixed a few pod typo's * Lifted the max of 255 for bind_columns 2008-06-04 0.50 - H.Merijn Brand * Skip a few tests in automated testing, as they confuse reports. This is important for the automated sites that mark modules as fail if it is not an obvious full PASS * 0.46 caused the last still open RT bug to be closed! * Tested on 5.11.0, 5.10.0, 5.8.8, 5.6.2, and 5.005_04, Strawberry and Cygwin 2008-06-04 0.46 - H.Merijn Brand * In examples add die on failed close calls * Use Test::MinimumVersion (not distributed) * Added option -F to examples/csv2xls * More source code cleanup * Nailed the UTF8 issues for parsing * Nailed the UTF8 issues for combining 2008-04-23 0.45 - H.Merijn Brand * Forgot to pack examples/parser-xs.pl 2008-04-23 0.44 - H.Merijn Brand * Fixed the error position returned as third arg in error_diag + () * Made examples/csv-check use this even more vebose * Removed double-double quote from TODO * Added examples/parse-xs.pl (attempt to fix bad CSV)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by markjugg (Curate) on Jun 24, 2008 at 12:43 UTC
    Hello H.Merijn. Is it possible to make this a recoverable error? "HR - bind_columns () did not pass enough refs for parsed fields"

    I know some rows will be too short. I just want to skip them or log them and move on. Or maybe there's a a way I can "pre-test" them before calling getline()?

      Text::CSV_XS comes with examples. There is csv-check and parser-xs.pl. Both show ways to parse (broken) CSV files reliably. Take the latter as an example.

      Making fatal errors recoverable is very hard, as the module does not buffer. Once the stream has been read, it is hard to backup in it. You can still probably use $csv->error_input () after a caught eval'ed failure.


      Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Sep 01, 2008 at 07:32 UTC

    It is very good to see people are actually using error_diag (), but it also means people find bugs.

    2008-09-01 0.53 - H.Merijn Brand * SvUPGRADE is a safer choice than sv_upgrade (Lubomir Rintel, + RT#38890) * bring docs in sync with reality for msg 2023 * Default eol for print is $\ * examples/csv-check should default to CSV , not to ; * Tests for SetDiag (0) * Tests for error 2030 * Code cleanup (Devel::Cover++) 2008-06-28 0.52 - H.Merijn Brand * Using undef for hash keys is a bad plan * Fix, tests, and documentation for column_names ()

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 15, 2008 at 11:14 UTC
    2008-10-15 0.55 - H.Merijn Brand * Improve documentation on eol * Unicode on perl-5.8.[0-2] sucks. Don't use it! * Test error codes in expected IO failures * Allow SetDiag to be used as class method * Document the MS/Excel separation character * Hint that eof is not an error per se (RT#40047) 2008-09-04 0.54 - H.Merijn Brand * IO failure in print () was not propagated (ilmari, RT#38960)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 21, 2008 at 20:26 UTC
    2008-10-21 0.56 - H.Merijn Brand * Update to ppport.h 3.14_01 * Updated docs (Unicode, TODO, Release Plan) * As Text::CSV::Encoded is about to be released, refer to it in the documentation * Default for eol is "", undef now treated as "" * Don't print $\ twice (eol prevails over $\ in ->print ()) Fix only works in perl5.8 and up * Undef treated as 0 for boolean attributes * Trailing whitespace in pod removed * Sync up doc with latest Text::CSV::Encoded * YAML declared 1.4 (META.yml) instead of 1.1 (YAML)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 31, 2008 at 12:14 UTC
    2008-10-30 0.58 - H.Merijn Brand * Small typo in test message (RT#40410, JPL) * Parse error test on "\0 (RT#40507) * Fixed allow_loose_escapes bug disclosed by RT#40507 2008-10-21 0.57 - H.Merijn Brand * Don't bootstrap twice. Don't know how/when it came in there

    With the bugfix in 0.58, the Devel::Cover coverage (including XS) went up to 99.8%!

    ---------------------------- ------ ------ ------ ------ ------ ------ + ------ File stmt bran cond sub pod time + total ---------------------------- ------ ------ ------ ------ ------ ------ + ------ CSV_XS.xs 99.6 n/a n/a n/a n/a n/a + 99.6 blib/lib/Text/CSV_XS.pm 100.0 100.0 100.0 100.0 100.0 100.0 + 100.0 Total 99.7 100.0 100.0 100.0 100.0 100.0 + 99.8 ---------------------------- ------ ------ ------ ------ ------ ------ + ------

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jan 23, 2009 at 09:15 UTC
    2009-01-23 0.59 - H.Merijn Brand * Wrong e-mail in META.yml * Missing $IO argument in bind_columns example (docs only) * Upped Copyright notices to 2009 * Added warning for parse () (RT#42261) * Small optimisations (Goro Fuji, RT#42517) * ppport.h updated to 3.15 * Added git clone command to README * ppport.h updated to 3.16-pre * Optimize getline/print method calls (Goro Fuji, RT#42517) * Decode *_char attributes for perl 5.8.2 and up

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Mar 08, 2009 at 10:31 UTC
    2009-03-08 0.61 - H.Merijn Brand * valgrind found a possible uninitialized value * Restriction in print () was only for really old perls * Fix for bind_columns () initialisation (vincent, RT#43927) 2009-01-27 0.60 - H.Merijn Brand * Optimize for threaded perls. (Goro Fuji, RT#42517) Non-threaded perls won't notice

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Apr 03, 2009 at 20:51 UTC

    Three more releases later. People still make me find hidden features/bugs :)

    2009-04-03 0.64 - H.Merijn Brand * Skip perlio tests for perl older than 5.8, as perlio was experimental in 5.6 * Up Devel::PPPort to 3.17 * Fix initialisation of eol => undef (could cause core dump) * Added configure_require to META.yml 2009-03-20 0.63 - H.Merijn Brand * Fixed allow_whitespace with sep_char = TAB (RT#44402) 2009-03-13 0.62 - H.Merijn Brand * Prevent warnings in older perls (without utf8) * All known errors are covered and/or documented. TODO dropped * Unicode TODO dropped. All covered, or use Text::CSV::Encoded * Documented the examples

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on May 15, 2009 at 06:55 UTC
    2009-05-14 0.65 - H.Merijn Brand * Initial known errors can now be checked on number (1002) * More tests for illegal combinations * Added -u option to examples/csv-check to validate utf-8 encoding * Correct documentation for error_diag () return value in case of constructor failure (Slaven, RT#46076) * All error_diag () returns should now be dual-var (both numeric and string context valid) * Remove (3) from L<..> links (Slaven, RT#46078)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Aug 07, 2009 at 14:34 UTC

    2009-08-07 0.66 - H.Merijn Brand

    • Reordered examples in doc to show best method first
    • Documentation grammatical fix (John P. Linderman, RT#46411)
    • Fail if first arg to new () is not a hash ref
    • Implement empty_is_undef on request of Evan Carroll
    • Typo in documentation (Herwin Weststrate, RT#47613)
    • error_diag () uses warn () in void context instead of STDERR
    • Added auto_diag attribute (still under construction)
    • FIX: reset attributes (after they have been set) with accessor

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 15, 2009 at 06:38 UTC

    Mac users rejoyce!

    I've come to the point where I now really think that for normal CSV, you won't have to set the eol attribute anymore for parsing CSV data. All off \n, \r\n, and now also \r are automatically recognized in streams.

    2009-10-10 0.69 - H.Merijn Brand

    • Missing end quotes in error code docs
    • examples/csv-check now shows detected eol
    • Auto detection of eol => "\r" in streams
    • Optimized caching. All cache changes now in XS

    2009-10-04 0.68 - H.Merijn Brand

    2009-08-08 0.67 - H.Merijn Brand

    • Fix empty_diag typo for attribute handler
    • Fix AUTOMATED_TESTING erroneous skip

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Feb 17, 2010 at 10:22 UTC

    2010-02-15 0.71 - H.Merijn Brand

    • Upped copyright to 2010
    • Prevent double encoding: make Text::CSV_XS streams behave just like perl would (thanks ikegami for the test cases)
    • Text::CSV_XS->error_diag () in void context now warns instead of doing nothing
    • auto_diag also used for new () itself

    2009-12-02 0.70 - H.Merijn Brand

    • Add quote_space attribute
    • Forbid \r and \n in sep_char, quote_char, and escape_char

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Mar 16, 2010 at 13:08 UTC

    2010-03-16 0.72 - H.Merijn Brand

    • Introduce quote_null attribute (RT#55200)
    • examples/csv-check can be used for Text::CSV_PP
    • examples/csv-check more options for sep_, escape_ and quote_char
    • examples/csv2xls more options for sep_, escape_ and quote_char
    • examples/csv2xls added auto_diag and verbosity
    • Dropped YAML spec to 1.0

    I also added a test to make sure a former inconsistency between Text::CSV_XS and Text::CSV_PP is tested for.


    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on May 04, 2010 at 12:11 UTC

    2010-05-03 0.73 - H.Merijn Brand

    • Improve date conversion in examples/csv2xls
      new option -D allows column selection for date conversions
    • Tested under perl-5.12.0 (and 21 other versions of perl)
    • Added a note about EBCDIC data files
    • Test suite is now safe for parallel test (prove --shuffle -j6)

    Together with the 1.611 release of DBI, it made a new release possible for DBD::CSV

    2010-05-03 DBD::CSV-0.29

    Some highlights from over the past few releases:

    • Support for f_encoding and f_lock
    • Documentation updates
    • Adjustments for windows
    • Mark all non-\w chars illegal in field and table names
    • Fix field types after execute
    • Fix for NULL joins

    Which leads you to:

    $dbh = DBI->connect ("dbi:CSV:", undef, undef, { f_schema => undef, f_dir => "data", f_ext => ".csv/r", f_lock => 2, f_encoding => "utf8", csv_null => 1, RaiseError => 1, PrintError => 1, FetchHashKeyName => "NAME_lc", }) or die $DBI::errstr;

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Sep 29, 2010 at 15:49 UTC

    2010-09-29 0.74 - H.Merijn Brand

    • Spelling fixes
    • Real eol support for parsing streams (beyond \n, \r and \r\n)
    • Clarify doc for always_quote to not quote undef fields
    • Clarify UTF8 process for print () and combine ()

    This release passed 35 versions of perl on a single box using a slightly modified version of Module::Release :)


    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Oct 13, 2010 at 07:15 UTC

    2010-10-09 0.76 - H.Merijn Brand

    • Windows doesn't support STDERR redirection as used in t/80_diag

    2010-10-05 0.75 - H.Merijn Brand

    • Fixed undefinedness of $\ in print (RT#61880)

    The fix in version 0.75 is a serious fix. Version 0.76 was just released because it broke CPANTESTERS on Windows. There has been no functionality change in 0.76 at all.

    I now have a graphical representation of speed differences between Text::CSV_XS and Text::CSV_PP. The first shows the difference between XS and PP. The second just shows XS method differences.


    Enjoy, Have FUN! H.Merijn
      Windows doesn't support STDERR redirection as used in t/80_diag

      Please forward that bug upstream

        I already posted the problem on the perl5 porters mailing list. I don't know if it is Test::more or something else that is to blame.


        Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Nov 26, 2010 at 14:53 UTC

    2010-10-23 0.77 - H.Merijn Brand

    • Internals now use warn () instead of (void)fprintf (stderr, ...)
      Now the test in t/80_diag also passes on Windows
    • Better parsing for eol = \r and set as such (RT#61525)
    • Workaround for AIX cpp bug (RT#62388, Jan Dubois)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Dec 25, 2010 at 10:53 UTC

    Here's my X-Mas present:

    2010-12-24 0.80 - H.Merijn Brand

    • Implement getline_all () and getaline_hr_all ()
    • Fixed another parsing for eol = \r (RT#61525)

    2010-11-26 0.79 - H.Merijn Brand

    • Use correct type for STRLEN (HP-UX/PA-RISC/32)
    • More code coverage
    • EOF unreliable when line-end missing at eof

    2010-11-26 0.78 - H.Merijn Brand

    • Version 0.77 broke MacOS exported CSV files with only \r

    The two new methods:

    • getline_all
      $arrayref = $csv->getline_all ($io); $arrayref = $csv->getline_all ($io, $offset); $arrayref = $csv->getline_all ($io, $offset, $length);

      This will return a reference to a list of getline ($io) results. In this call, keep_meta_info is disabled. If $offset is negative, as with splice (), only the last abs ($offset) records of $io are taken into consideration.

      Given a CSV file with 10 lines:

      lines call ----- --------------------------------------------------------- 0..9 $csv->getline_all ($io) # all 0..9 $csv->getline_all ($io, 0) # all 8..9 $csv->getline_all ($io, 8) # start at 8 - $csv->getline_all ($io, 0, 0) # start at 0 first 0 rows 0..4 $csv->getline_all ($io, 0, 5) # start at 0 first 5 rows 4..5 $csv->getline_all ($io, 4, 2) # start at 4 first 2 rows 8..9 $csv->getline_all ($io, -2) # last 2 rows 6..7 $csv->getline_all ($io, -4, 2) # first 2 of last 4 rows
    • getline_hr_all
      $arrayref = $csv->getline_hr_all ($io); $arrayref = $csv->getline_hr_all ($io, $offset); $arrayref = $csv->getline_hr_all ($io, $offset, $length);

      This will return a reference to a list of getline_hr ($io) results. In this call, keep_meta_info is disabled.


    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jun 15, 2012 at 06:16 UTC

    I have just uploaded version 0.90 to CPAN, which marks a bigger step than the usual releases, as I have dropped perl-5.005 support. Perl 5.6.1 is now the minimum.

    0.90 - 2012-06-15, H.Merijn Brand

    • Drop 5.005 support (5.6.1 is now minimum)
    • Introduce record_number
    • Try harder to get the complete input parsed for the current record when hitting errors on parsing seekable IO (only works in 5.14.0 and up)
    • Tested with perl 5.6.1 .. 5.17.0 (99 versions of perl) on Linux, HP-UX, AIX, and Windows
    • SvSETMAGIC was missing for tied variables causing weird actions at a distance, e.g. in printf (Thanks TonyC for finding this)
    • UTF8 flag was not always reset when using bound variables (TonyC)

    0.88 - 2012-03-16, H.Merijn Brand

    • Fix for $/ in 0.86 broke parsing fields that contain excessive $/'s

    0.87 - 2012-03-08, H.Merijn Brand

    • Extra check on utf8 output (RT#74330)
    • examples/csvdiff now recognizes numerically sorted CSV files
    • Document example comparing getline_hr vs bind_columns + getline

    0.86 - 2012-01-22, H.Merijn Brand

    • Introduce quote_binary attribute
    • Update copyright to 2012
    • Versions
    • Fixed a utf8::decode on undef (found by perl-5.15.7)
    • Fixed localized $/ interference with other handles (RT#74216)

    0.85 - 2011-09-07, H.Merijn Brand

    • NAME / DISTNAME in Makefile.PL

    0.84 - 2011-09-07, H.Merijn Brand

    • More cross-checks for META data

    0.83 - 2011-08-07, H.Merijn Brand

    • Fix spurious auto_diag warning (RT#69673)
    • Tested with 50 versions of perl, including 5.15.1

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Nov 13, 2012 at 13:05 UTC

    A serious UTF-8 fix: output could be broken if buffer bounds cut encoded code points in half.

    0.92 - 2012-11-12, H.Merijn Brand

    • Allow bind_columns () for print ()
    • Increase buffer size for print to 64k
    • Fix RT#80680 - Buffer break halfway UTF8 + tests

    0.91 - 2012-08-21, H.Merijn Brand

    • Prevent test-failures for long doubles on weird architectures
    • More utf-8 tests for te change of 0.90
    • Update test case now 5.005 is not supported anymore
    • Rip out the tell/seek introduced in 0.90

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jan 13, 2013 at 18:39 UTC

    Several small fixes, but 0.94 might fix more than you needed

    0.95 - 2013-01-13, H.Merijn Brand

    • Introduce allow_unquoted_escape as workaround for RT#81295
    • Update copyright to 2013
    • Introduce print_hr () for RT#76143
    • Dropped plans to support EBCDIC

    0.94 - 2012-12-03, H.Merijn Brand

    • Guard against beta releases of Encode (Xavier Guimard - RT#81499)
    • Fix sv_cache init global-buffer-overflow (Reini Urban - RT#81469)
    • Tested with perl compiled with clang (also to verify RT#81469)
    • Fix memory leak reported in RT#81539 (Fix by Tony Cook)

    0.93 - 2012-11-19, H.Merijn Brand

    • Skip Encode related tests on too old Encode
    • Force old(er) tar format (ustar) - assumes GNU tar on release box

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jun 13, 2013 at 18:04 UTC

    I was inevitable, but with the steady releases, I was bound to hit 1.00. Here it is

    1.00 - 2013-06-13

    • Fix automatic UTF-8 in getline/parse for SV's with \0

    0.99 - 2013-06-05

    • Documents return value of bind_columns without arguments
    • Fix automatic UTF-8 in getline/parse

    0.98 - 2013-06-03

    • Clarify eol documentation
    • Move error_input to XS

    0.97 - 2013-03-30

    • Regain the speed from 0.91 (buffer back to 1k)
    • Minor cleanup in XS code
    • Add diag_verbose attribute

    0.96 - 2013-03-26

    • No need to require Test::Harness if unused (RT#82693)
    • parse ("") should return one empty field, not undef
    • Now that we know the record number, show it in auto_diag

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by Tux (Monsignor) on Jan 21, 2014 at 16:02 UTC

    Twitter hinted me to implement RFC7111. See syntax below. If no (serious) objections, I will make a new release soon.

    1.03 - 2014-01-21

    • Implement RCF7111

    1.02 - 2013-09-25

    • Add example for reading only a single column
    • Don't store NULL in _ERROR_INPUT (RT#86217/Clone)
    • Prevent double-decode in csv-check
    • Add decode_utf8 attribute (default is true)

    1.01 - 2013-06-16

    • Cache not re-read on getline_all (RT#86155)

    Enjoy, Have FUN! H.Merijn
Re: The future of Text::CSV_XS - TODO
by XonqNopp (Initiate) on Feb 05, 2014 at 15:18 UTC

    Hi and thanks for this great module,

    I was wondering if you could add an option (unless it already exists and I missed it in the documentation).

    I am currently dealing with CSV files that were transmitted from a remote computer somewhere in the nature to our server. There are often lines corrupted, meaning they can have more or less values than the header. I want to skip them (because they are corrupted) but not die the program (because next lines are good). I use the bind_columns but when there are more values I get the 3006 error and it gets out of the while. Putting an error_diag right after the while loop and checking for the 3006 error, I can then say goto beginning_of_loop, but this is not the solution I prefer... And I could not check if there was enough values to match the headers...

    So what I imagine is an option in the constructor letting the programer choose if the number of columns matches the header (if any) goes on without reporting, make a warning that could go invisible or fetched in error_diag or make the usual error as now (unseen if no auto_diag but dies anyway). But I let you deal with the 'how'...

    XN

      As of Text::CSV_XS-1.05, you can catch/ignore any error at your own risk:

      my ($c, $s); sub ignore3006 { my ($err, $msg, $pos, $recno) = @_; if ($err == 3006) { # ignore this error ($c, $s) = (undef, undef); SetDiag (0); } # Any other error return; } # ignore3006 $csv->callbacks (error => \&ignore3006); $csv->bind_columns (\$c, \$s); while ($csv->getline ($fh)) { # Error 3006 will not stop the loop }

      Note that this API is young and new. New insights might enhance or change later on.


      Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://617577]
Approved by kyle
Front-paged by ysth
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2014-04-17 18:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (453 votes), past polls