Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^3: Text::CSV encoding parse()

by haukex (Bishop)
on Aug 13, 2019 at 19:44 UTC ( #11104415=note: print w/replies, xml ) Need Help??

in reply to Re^2: Text::CSV encoding parse()
in thread Text::CSV encoding parse()

Hi, yes I'm using the CGI module and have it properly set: print $q->header(-charset => 'utf-8'); And as mentioned if I don't use Text::CVS the characters display correctly.

Ok, but I'm sorry, there still isn't enough information to answer your question - have another look at my reply above, plus the links therein.

Replies are listed 'Best First'.
Re^4: Text::CSV encoding parse()
by slugger415 (Monk) on Aug 14, 2019 at 17:43 UTC

    Hello, ok here's as short and succinct a sample as I can create.

    use Text::CSV; use CGI; my($row) = "search/┐Cuales son las partes de una cadena de conexiˇn??s +cope|ids_jdbc_011.htm|0|1|1|0"; my $csv = Text::CSV->new ({ binary => 1, sep_char => "|" }); my $q = new CGI; # print the HTML header and start html print $q->header; print $q->start_html; # first, print $row as is print $q->p("ROW: $row"); # next, parse with $csv $csv->parse($row); my @els = $csv->fields; # print the first field # this displays the black diamond ? symbol for ┐ and ˇ print $q->p("CSV Parse, field 0:",$els[0]); # split instead my(@splits) = split('\|',$row); # print the first element in @splits. # As noted, this one displays properly in the browser. print $q->p("split 0:", $splits[0]); print $q->end_html; exit;



    UMM, update, when I actually ran the above in my http server I got the opposite results, but with weird errors.

    ROW: search/┬┐Cuales son las partes de una cadena de conexi├│n??scope| +ids_jdbc_011.htm|0|1|1|0 CSV Parse, field 0: search/┐Cuales son las partes de una cadena de con +exiˇn??scope split 0: search/┬┐Cuales son las partes de una cadena de conexi├│n??sc +ope

    Paint me confused.

    In the real script, $row is coming from a @sorted_array from an SQL query. This is getting confusing so maybe I should withdraw my question.

      I don't see any mention of any encoding in this code, which is not good. And earlier you said: "I'm using the CGI module and have it properly set: print $q->header(-charset => 'utf-8');" so I doubt this code is representative.

      You need to:

      • Use a Perl version >= 5.12 and say use feature 'unicode_strings'; or use 5.012; (or higher).
      • If you have any non-ASCII characters in your Perl script, save it as UTF-8 and add the use utf8; directive at the top.
      • Make sure your data is coming from the database properly encoded. As I linked to above, you can check this via Devel::Peek. If you need that output to go to the browser, see this.
      • Make sure you are doing binmode STDOUT, ':encoding(UTF-8)'; or use open qw/:std :utf8/;.
      • Make sure you are telling your browser what encoding you are sending it.

      Text::CSV is not the problem:

      use warnings; use strict; use Devel::Peek; use Text::CSV; my $str = "\N{U+20AC}|\N{U+20AC}"; Dump($str); # ... UTF8 "\x{20ac}|\x{20ac}" ... my ($s1,$s2) = split /\|/, $str; Dump($s1); # ... UTF8 "\x{20ac}" ... Dump($s2); # ... UTF8 "\x{20ac}" ... my $csv = Text::CSV->new ({ binary => 1, sep_char => "|" }); $csv->parse($str); my ($c1,$c2) = $csv->fields; Dump($c1); # ... UTF8 "\x{20ac}" ... Dump($c2); # ... UTF8 "\x{20ac}" ...

        Hello haukex, thanks a million for your advice, but I have to confess this is way beyond my understanding or abilities in many ways, so I think I'm going to have to live with it. FWIW every time encoding problems come up I get lost in the weeds. (I'm not a developer, just a Perl hack, don't grok hexdump or Devel::Peek etc.)

        All I know in this case is I can print my @sorted_array rows to a flat file (opened with Notepad++) or to a web page using CGI, and those characters look fine. It's only when I use Text:CSV that something goes haywire.

        Anyway I appreciate your patience and help, sorry for the trouble.

      In what encoding have you saved the source code? The recommended practice is to use UTF-8 and tell Perl that your source code contains non-ascii UTF-8 characters (i.e. use utf8).

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        Hi Choroba, my editor Notepad++ is set to UTF-8.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11104415]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2020-08-14 20:35 GMT
Find Nodes?
    Voting Booth?
    Which rocket would you take to Mars?

    Results (76 votes). Check out past polls.