Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

by Tux (Monsignor)
on Oct 03, 2011 at 06:37 UTC ( #929276=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
in thread Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

Let me try to simplify that a bit ...

use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ auto_diag => 1, # Let Text::CSV_XS do the analysis always_quote => 1, binary => 1, eol => $INPUT_RECORD_SEPARATOR, }); binmode STDOUT, ':encoding(UTF-8)'; for my $file (@ARGV) { open my $fh, '<:encoding(UTF-8)', $file; while (my $fields = $csv->getline ($fh)) { $csv->print (*STDOUT, $fields); # no need for a reference } # due to auto_diag, no need for error checking here close $fh; }

If this script is to sanitize CSV data, I'd advice TWO csv objects. One for parsing, that does not pass the always_quote and eol attribute, and one for output. The advantage is that all legal line-endings are parsed well automatically, even if mixed.

I have no neat way to the BOM problem other than what you already use.


Enjoy, Have FUN! H.Merijn


Comment on Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
Download Code
Replies are listed 'Best First'.
Re^4: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?
by Jim (Curate) on Oct 03, 2011 at 14:16 UTC

    Thank you, again, Tux. I genuinely appreciate the tips. I'll brush up on auto_diag.

    The BOM is a nuisance, especially in CSV files. In one of my real programs that uses Text::CSV_XS (what I posted here is a reduction that simply demonstrates a specific problem I was having), I'm stymied by the confluence of byte order marks in UTF-8 files that force me to use File::BOM and, unfortunately, some malformed UTF-8 text in the data that kills CSV parsing with this unforgiving error message:

    utf8 "\xEC" does not map to Unicode at C:/strawberry/perl/lib/Encode.p +m line 176.

    I don't know how to tell Text::CSV_XS or File::BOM to tell Encode to lighten up already about one or two bogus characters! :-(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929276]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2015-07-29 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (263 votes), past polls