http://www.perlmonks.org?node_id=929108


in reply to Re: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")
in thread Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?

I expect there to be an easy way in Perl to use built-in default idioms, to assert that my input and output are in the UTF-8 character encoding form of Unicode, and to use CPAN modules, all at the same time, and without having to know what an "XS module" is.

Specifically, I want to process many CSV files that I feed to the Perl program via @ARGV. I want to use the CPAN module Text::CSV_XS to parse the CSV records. I don't want to open and close files explicitly; I want Perl to open and close them for me implicitly. I want to continue to use Perl's built-in idioms that permit me to avoid needless extra programming, just as I always have.

  • Comment on Re^2: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")
  • Select or Download Code

Replies are listed 'Best First'.
Re^3: Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma? ("XS")
by remiah (Hermit) on Oct 02, 2011 at 07:49 UTC
    Your unexpected output seems ISO-8859-1 output of the SPADE charcters. Probably, If you put the output to the text, and See the results in your browser with utf-8 encoding, You'see the SPADE.
    print qq("BLACK SPADE SUIT","BLACK HEART SUIT","BLACK DIAMOND SUIT","B +LACK CLUB SUIT",\n); #decimail unicode character for above; my @ary=("♠","♥","♦","♣"); foreach my $target(@ary) { $target =~ s/\&#(.*);/$1/; print '"' . encode('utf8', chr($target)) . '",'; } print "\n";
    I mean , this is terminal problem , doesn't it ?

      I mean , this is terminal problem , doesn't it ?

      No. It looks similar but no. The problem, in a nutshell, if you use warn "$ARGV $_ " for PerlIO::get_layers(*ARGV) you can see ARGV doesn't get utf8 io layer, only STDIN gets them

      $ perl ... utf8wobom.csv >bad utf8wobom.csv unix at ... utf8wobom.csv crlf at ... $ perl ... < utf8wobom.csv >good - unix at ... - crlf at ... - encoding(utf-8-strict) at ... - utf8 at ...

      In my non-utf terminal it shows

      $ ls -loanh good bad -rw-rw-rw- 1 0 115 2011-10-02 01:31 bad -rw-rw-rw- 1 0 103 2011-10-02 01:31 good $ diff good bad 2c2 < "♠","♥","♦","♣" --- > "♠","♥","♦","♣"
        Probably, Text::CSV::Encoded is what you are looking for.
        use Text::CSV::Encoded; my $csv = Text::CSV::Encoded->new ({binary=>1, encoding=>"utf8"}) or d +ie $!; while (my $row = $csv->getline (*ARGV)) { $csv->print(\*STDOUT, $row); }
        This works fine with my perl 5.12.2, with command line ...
        perl test.pl test.csv