Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;

by moritz (Cardinal)
on Jun 20, 2014 at 12:26 UTC ( #1090613=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;
in thread Default encoding rules leave me puzzled...

So, does Perl assume by default, even in a UTF-8 environment, that it should output everything in Latin-1 ?

Perl tries to not convert anything at all, automatically.

And since Latin-1 (mostly?) maps the first 256 codepoints 1:1 to bytes, outputting something without any conversion is the same as outputting it as Latin-1.

Note that this round-trips binary data, which means that if your scripts or input use UTF-8, and you don't use utf8;, the output will be UTF-8 again.

But, Latin-1 is limited to codepoints up to 255, so if something higher than that shows up in your string, perl falls back to UTF-8 (and warns).

(As always, I'm linking to Encodings and Unicode in Perl, in the hope that it's useful to you).


Comment on Re^3: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;
Download Code
Re^4: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;
by kzwix (Sexton) on Jun 20, 2014 at 13:08 UTC
    I'm sorry, but I think that your "Perl tries to not convert anything at all, automatically" statement is wrong.
    I mean, else, why would a string internally stored as UTF-8 be converted to Latin-1 when sent to the standard output ?
    (That is, without having used any funny encoding/decoding/layer stuff...)

      It's your use utf8; statement that explicitly converts the source code to perl's internal format, which happens to be latin-1 in this case. The output operation then doesn't convert anything. Maybe I should have said that no implicit conversion takes place, because I don't coutn the use utf8; as automatic/implicit.

      I mean, else, why would a string internally stored as UTF-8 be converted to Latin-1 when sent to the standard output ?

      Because you are printing the string, not its internal representation. The layout of a scalar is irrelevant.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090613]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2015-07-05 14:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls