Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

by Tux (Monsignor)
on Oct 02, 2011 at 07:40 UTC ( #929115=note: print w/ replies, xml ) Need Help??


in reply to Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

This is a known bug, and has just recently been fixed. There is nothing that Text::CSV_XS can do about. It should "jut work" (TM) if you can upgrade IO to the version that includes this patch.


Enjoy, Have FUN! H.Merijn


Comment on Re: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
by Anonymous Monk on Oct 02, 2011 at 08:41 UTC
Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
by Jim (Curate) on Oct 02, 2011 at 21:11 UTC

    Thank you very much, Tux! Thankfully, one can still get kind and gracious help with Perl problems from Perl experts on PerlMonks.

    Upgrading to a bleadperl version of IO isn't practical for me in my environment. So for now, I'll simply not use the open pragma to set default I/O layers and, instead, open files and set I/O layers explicitly in my Perl programs, like this…

    #!perl use strict; use warnings; use autodie qw( open close ); use English qw( -no_match_vars ); use Text::CSV_XS; my $csv = Text::CSV_XS->new({ always_quote => 1, binary => 1, eol => $INPUT_RECORD_SEPARATOR, }); binmode STDOUT, ':encoding(UTF-8)'; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; while (my $fields = $csv->getline($fh)) { $csv->print(\*STDOUT, $fields); } $csv->eof() or $csv->error_diag(); close $fh; } exit 0;

    Now if I could just figure out how best to handle UTF-8 CSV files that have byte order marks in them. ;-) Text::CSV_XS alone chokes on them. I'm currently doing this…

    use File::BOM qw( open_bom ); open my $input_fh, '<:via(File::BOM)', $input_file; open my $output_fh, '>:encoding(UTF-8):via(File::BOM)', $output_file;

    Is this The Right Way?

      Let me try to simplify that a bit ...

      use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ auto_diag => 1, # Let Text::CSV_XS do the analysis always_quote => 1, binary => 1, eol => $INPUT_RECORD_SEPARATOR, }); binmode STDOUT, ':encoding(UTF-8)'; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; while (my $fields = $csv->getline ($fh)) { $csv->print (*STDOUT, $fields); # no need for a reference } # due to auto_diag, no need for error checking here close $fh; }

      If this script is to sanitize CSV data, I'd advice TWO csv objects. One for parsing, that does not pass the always_quote and eol attribute, and one for output. The advantage is that all legal line-endings are parsed well automatically, even if mixed.

      I have no neat way to the BOM problem other than what you already use.


      Enjoy, Have FUN! H.Merijn

        Thank you, again, Tux. I genuinely appreciate the tips. I'll brush up on auto_diag.

        The BOM is a nuisance, especially in CSV files. In one of my real programs that uses Text::CSV_XS (what I posted here is a reduction that simply demonstrates a specific problem I was having), I'm stymied by the confluence of byte order marks in UTF-8 files that force me to use File::BOM and, unfortunately, some malformed UTF-8 text in the data that kills CSV parsing with this unforgiving error message:

        utf8 "\xEC" does not map to Unicode at C:/strawberry/perl/lib/Encode.p +m line 176.

        I don't know how to tell Text::CSV_XS or File::BOM to tell Encode to lighten up already about one or two bogus characters! :-(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929115]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-12-27 13:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls