Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: Bug in Template?

by remiah (Hermit)
on Mar 22, 2012 at 03:54 UTC ( [id://960921]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Bug in Template?
in thread Bug in Template?

This seems not a problem of Template. I also want advice for this.

“Séan”'s é may be 00E9 of unicode table http://www.utf8-chartable.de/unicode-utf8-table.pl. I thought decode it to perl internal utf8 and pass them to Template encoding it utf8 will work. But it is not work. Without Template, there is strange behavior.

#!/usr/bin/perl use strict; use warnings; use Encode qw(is_utf8 encode decode); use Template; my(@raw, @decoded_internal_utf8,@encoded_raw_utf8,@encoded_internal_ut +f8); my @chars=hex('00C0') .. hex('00F0'); #target characters #my @chars=hex('3041') .. hex('3096'); #hiragana foreach my $code ( @chars ){ my($raw, $chr); $raw =chr($code); if ( is_utf8($raw) ){ $chr=$raw; } else { $chr=decode('utf8',$raw); } push @raw, $raw; push @decoded_internal_utf8, $chr; push @encoded_raw_utf8 , encode('utf8', $raw); push @encoded_internal_utf8, encode('utf8', $chr); } print "======================\n"; print "perl=$^X : version=$]\n"; print "1.###raw\n"; print "#$_#\n" for @raw; print "2.###decoded_intenal_utf8\n"; #print "#$_#\n" for @decoded_internal_utf8; print "3.###encoded_raw_utf8\n"; print "#$_#\n" for @encoded_raw_utf8; print "4.###encoded_internal_utf8\n"; print "#$_#\n" for @encoded_internal_utf8;
It is strange No3 only works at this case. I usualy print characters with No 4. Japanese characters like "hiragana" seems to have no problem( for example,'3041' .. '3096').

I saw similar problem at Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?. At that time, I didn't understand well and thought newer version would have no problem... Is this the same trouble? I tried with 5.012002 and 5.014002. They print exact same output except version number.

Replies are listed 'Best First'.
Re^4: Bug in Template?
by Anonymous Monk on Mar 22, 2012 at 08:27 UTC

    I'm confused by your code, what is it supposed to demonstrate? perlunitut: Unicode in Perl warns against using is_utf8, so I wouldn't use it

    Consider

    $ perl -le " print chr hex q/C0/ " | od -tx1 0000000 c0 0d 0a 0000003
    when viewed as Windows-1252 it is À

    And this

    $ perl -le " binmode STDOUT , q/:utf8/; print chr hex q/C0/ " | od -tx +1 0000000 c3 80 0d 0a 0000004
    when viewed as Windows-1252 it is À but viewed as UTF-8 it is 
    And this

    $ perl -MEncode -le " print decode(q/utf8/, chr hex q/C0/ )" | od -tx1 Wide character in print at -e line 1. 0000000 ef bf bd 0d 0a 0000005
    when viewed as Windows-1252 it is � but viewed as UTF-8 it is �

    If you search for ef bf bd you'll see lots of questions about this erroneous conversion

    So if you want to treat chr 192 (  perl -le " print  hex q/C0/ " ) as unicode you have to encode it, because characters 0 to 255 are also valid Latin-1, they are not utf8

    $ perl -le " print chr hex q/C0/ " |od -tx1 0000000 c0 0d 0a 0000003 $ perl -le " print chr 255 " |od -tx1 0000000 ff 0d 0a 0000003 $ perl -le " print chr 256 " |od -tx1 Wide character in print at -e line 1. 0000000 c4 80 0d 0a 0000004

    Or, if you want chr 192 to return unicode, use encoding pragma ( utf8 pragma doesn't affect chr )

    $ perl -le " use encoding q/utf8/; print chr 192 " |od -tx1 0000000 c3 80 0a 0000003

      Thanks for reply. I will read perlunitut and found sites that explains unicode in perl precisely when googled with "ef bf bd". I am printing now...

      When the characer comes from outside of perl, We have to decode the bytes to perl's internal utf8, as perlunitut says. Especially when you want to know the length of characer. For example, cgi's param() will return bytes and when I want to know the length of the word, I decode it.

      My question in short, here comes two character '00E9' and '3041'. They must be two character in utf8. How do you substring the second character and print it?

      I agree my example clumsy. Is this clear? I guess this is OP's problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://960921]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-23 23:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found