Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;

by Anonymous Monk
on Jun 20, 2014 at 09:04 UTC ( #1090591=note: print w/ replies, xml ) Need Help??


in reply to Default encoding rules leave me puzzled...

My question is: Why do I have to specify this encoding ? I thought that Perl adapted to its environment, and the localization environment variables should all be readable, right ? Can someone explain the reason to me, or point me to relevant documentation ?

use open qw/ :std :locale /;

Tutorials: perlunitut: Unicode in Perl

Also download tarball from Perl Unicode Essentials: OSCON 2011 - O'Reilly Conferences, July 25 - 29, 2011, Portland, OR for even more unicode info

why no default unicode?


Comment on Re: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;
Re^2: Default encoding rules leave me puzzled...
by Anonymous Monk on Jun 20, 2014 at 09:33 UTC
Re^2: Default encoding rules leave me puzzled... (use open qw/ :std :locale /;
by kzwix (Sexton) on Jun 20, 2014 at 09:36 UTC

    Sorry, I realize I wasn't specific enough:
    I've read about Encode, and successfully used it in a previous project. I know about the need to decode and encode streams, too. However, it seemed to me that Perl did some of this job itself (as I had tried to explicitly decode data from the standard input, or from command-line arguments, and had experienced strange results)

    So, is there some place where it is explicitly stated what is converted by perl, in a transparent manner, and what isn't ?

    Furthermore, even though I didn't Encode or Decode the streams, shouldn't it "just work", if the scalar value is specified in UTF-8 (because the file is encoded as such), and Perl is AWARE that it is UTF-8 (because of 'use utf8;'), and Perl stores it internally in UTF-8, and the expected output format is UTF-8 too ?

    I'm pretty sure there is a catch I haven't figured out, there, but pointing it to me, even if obvious, could help. Thanks !

    EDIT: I've run a short test, using a Latin-1 terminal (this test script is fully encoded in UTF-8):

    #!/usr/bin/perl use utf8; use Encode; $\ = "\n"; my $unicodeScalar = "Je suis une chaîne accentuée là où il faut."; print '['.Encode::is_utf8($unicodeScalar).'] '.$unicodeScalar;

    Using my Latin-1 terminal, I displayed the source file, and, sure enough, the contents were garbled (2 strange bytes for each accentuated character, which confirmed me the file was truly UTF-8), then I ran the script. And I got a perfect display.

    So, does Perl assume by default, even in a UTF-8 environment, that it should output everything in Latin-1 ?

      So, does Perl assume by default, even in a UTF-8 environment, that it should output everything in Latin-1 ?

      Perl tries to not convert anything at all, automatically.

      And since Latin-1 (mostly?) maps the first 256 codepoints 1:1 to bytes, outputting something without any conversion is the same as outputting it as Latin-1.

      Note that this round-trips binary data, which means that if your scripts or input use UTF-8, and you don't use utf8;, the output will be UTF-8 again.

      But, Latin-1 is limited to codepoints up to 255, so if something higher than that shows up in your string, perl falls back to UTF-8 (and warns).

      (As always, I'm linking to Encodings and Unicode in Perl, in the hope that it's useful to you).

        I'm sorry, but I think that your "Perl tries to not convert anything at all, automatically" statement is wrong.
        I mean, else, why would a string internally stored as UTF-8 be converted to Latin-1 when sent to the standard output ?
        (That is, without having used any funny encoding/decoding/layer stuff...)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1090591]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2015-07-04 10:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (59 votes), past polls