Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Text::CSV encoding parse()

by slugger415 (Monk)
on Aug 13, 2019 at 18:45 UTC ( #11104407=note: print w/replies, xml ) Need Help??


in reply to Re: Text::CSV encoding parse()
in thread Text::CSV encoding parse()

Hi, yes I'm using the CGI module and have it properly set:

print $q->header(-charset    => 'utf-8');

And as mentioned if I don't use Text::CVS the characters display correctly.

Replies are listed 'Best First'.
Re^3: Text::CSV encoding parse()
by haukex (Chancellor) on Aug 13, 2019 at 19:44 UTC
    Hi, yes I'm using the CGI module and have it properly set: print $q->header(-charset => 'utf-8'); And as mentioned if I don't use Text::CVS the characters display correctly.

    Ok, but I'm sorry, there still isn't enough information to answer your question - have another look at my reply above, plus the links therein.

      Hello, ok here's as short and succinct a sample as I can create.

      use Text::CSV; use CGI; my($row) = "search/¿Cuales son las partes de una cadena de conexión??s +cope|ids_jdbc_011.htm|0|1|1|0"; my $csv = Text::CSV->new ({ binary => 1, sep_char => "|" }); my $q = new CGI; # print the HTML header and start html print $q->header; print $q->start_html; # first, print $row as is print $q->p("ROW: $row"); # next, parse with $csv $csv->parse($row); my @els = $csv->fields; # print the first field # this displays the black diamond ? symbol for ¿ and ó print $q->p("CSV Parse, field 0:",$els[0]); # split instead my(@splits) = split('\|',$row); # print the first element in @splits. # As noted, this one displays properly in the browser. print $q->p("split 0:", $splits[0]); print $q->end_html; exit;

      thanks

      ======================

      UMM, update, when I actually ran the above in my http server I got the opposite results, but with weird errors.

      ROW: search/¿Cuales son las partes de una cadena de conexión??scope| +ids_jdbc_011.htm|0|1|1|0 CSV Parse, field 0: search/¿Cuales son las partes de una cadena de con +exión??scope split 0: search/¿Cuales son las partes de una cadena de conexión??sc +ope

      Paint me confused.

      In the real script, $row is coming from a @sorted_array from an SQL query. This is getting confusing so maybe I should withdraw my question.

        I don't see any mention of any encoding in this code, which is not good. And earlier you said: "I'm using the CGI module and have it properly set: print $q->header(-charset => 'utf-8');" so I doubt this code is representative.

        You need to:

        • Use a Perl version >= 5.12 and say use feature 'unicode_strings'; or use 5.012; (or higher).
        • If you have any non-ASCII characters in your Perl script, save it as UTF-8 and add the use utf8; directive at the top.
        • Make sure your data is coming from the database properly encoded. As I linked to above, you can check this via Devel::Peek. If you need that output to go to the browser, see this.
        • Make sure you are doing binmode STDOUT, ':encoding(UTF-8)'; or use open qw/:std :utf8/;.
        • Make sure you are telling your browser what encoding you are sending it.

        Text::CSV is not the problem:

        use warnings; use strict; use Devel::Peek; use Text::CSV; my $str = "\N{U+20AC}|\N{U+20AC}"; Dump($str); # ... UTF8 "\x{20ac}|\x{20ac}" ... my ($s1,$s2) = split /\|/, $str; Dump($s1); # ... UTF8 "\x{20ac}" ... Dump($s2); # ... UTF8 "\x{20ac}" ... my $csv = Text::CSV->new ({ binary => 1, sep_char => "|" }); $csv->parse($str); my ($c1,$c2) = $csv->fields; Dump($c1); # ... UTF8 "\x{20ac}" ... Dump($c2); # ... UTF8 "\x{20ac}" ...
        In what encoding have you saved the source code? The recommended practice is to use UTF-8 and tell Perl that your source code contains non-ascii UTF-8 characters (i.e. use utf8).

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^3: Text::CSV encoding parse()
by jcb (Chaplain) on Aug 14, 2019 at 03:24 UTC

    That means that you are declaring to the browser that your output is UTF-8. Is it actually UTF-8?

        not sure

        Get information. If you have no better idea, use the dumpstr() function in t/UChelp.pm from DBD::ODBC. Just copy the few lines into your code and print its result for each string that should be UTF-8.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Seconding afoken here — if you do not know, find out!

        Try the hexdump function from this sample: (lightly tested, hopefully correct)

        hexdump-test.pl:
        #!/usr/bin/perl use strict; use warnings; # given: string of bytes # return: hexdump of argument sub hexdump ($) { use bytes; my @bytes = map {[$_, ord]} split //, shift; return '['.join(' ', map {sprintf('%02x', $_->[1])} @bytes).']' .'|'.join('', map { ($_->[1] >= 0x20 && $_->[1] < 0x7F) ? $_->[0] : '.' } @bytes).'|' } use utf8; my $text = q[search/¿Cuales son las partes de una cadena de conexión?? +scope]; print hexdump($text), "\n"; __END__

        Sample output:

        [73 65 61 72 63 68 2f c2 bf 43 75 61 6c 65 73 20 73 6f 6e 20 6c 61 73 +20 70 61 72 74 65 73 20 64 65 20 75 6e 61 20 63 61 64 65 6e 61 20 64 +65 20 63 6f 6e 65 78 69 c3 b3 6e 3f 3f 73 63 6f 70 65]|search/..Cuale +s son las partes de una cadena de conexi..n??scope|

        I am fairly sure that if hexdump dies, the string you gave it was definitely not UTF-8. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11104407]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (8)
As of 2019-10-23 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?