Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

by Tux (Monsignor)
on Oct 03, 2011 at 06:37 UTC ( #929276=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
in thread Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

Let me try to simplify that a bit ...

use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ auto_diag => 1, # Let Text::CSV_XS do the analysis always_quote => 1, binary => 1, eol => $INPUT_RECORD_SEPARATOR, }); binmode STDOUT, ':encoding(UTF-8)'; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; while (my $fields = $csv->getline ($fh)) { $csv->print (*STDOUT, $fields); # no need for a reference } # due to auto_diag, no need for error checking here close $fh; }

If this script is to sanitize CSV data, I'd advice TWO csv objects. One for parsing, that does not pass the always_quote and eol attribute, and one for output. The advantage is that all legal line-endings are parsed well automatically, even if mixed.

I have no neat way to the BOM problem other than what you already use.


Enjoy, Have FUN! H.Merijn


Comment on Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
Download Code
Re^4: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
by Jim (Curate) on Oct 03, 2011 at 14:16 UTC

    Thank you, again, Tux. I genuinely appreciate the tips. I'll brush up on auto_diag.

    The BOM is a nuisance, especially in CSV files. In one of my real programs that uses Text::CSV_XS (what I posted here is a reduction that simply demonstrates a specific problem I was having), I'm stymied by the confluence of byte order marks in UTF-8 files that force me to use File::BOM and, unfortunately, some malformed UTF-8 text in the data that kills CSV parsing with this unforgiving error message:

    utf8 "\xEC" does not map to Unicode at C:/strawberry/perl/lib/Encode.p +m line 176.

    I don't know how to tell Text::CSV_XS or File::BOM to tell Encode to lighten up already about one or two bogus characters! :-(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929276]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (12)
As of 2014-04-24 12:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (565 votes), past polls