Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Standard handles inherited from a utf-8 enabled shell

by BrowserUk (Patriarch)
on Mar 21, 2012 at 16:32 UTC ( [id://960809]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

If you are using a utf-8 enabled shell and you start Perl without any special options (ie. no -Cxx), do that perl instance's standard handles inherit the utf-ness?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re: STandard handles inherited from a utf-8 enabled shell
by choroba (Cardinal) on Mar 21, 2012 at 17:01 UTC
    What do you mean by utf-ness? Compare
    echo -n š | perl -e 'my $x =<>; print $x, length($x), "\n"' š2
    versus
    echo -n š | perl -C -e 'my $x =<>; print $x, length($x), "\n"' š1
      What do you mean by utf-ness?

      Please see the subthread starting at: Re^3: Help with pack error


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

Re: Standard handles inherited from a utf-8 enabled shell
by moritz (Cardinal) on Mar 21, 2012 at 17:49 UTC

    No. The "utf-8 enabled" is a property of the terminal, not of the shell. Perl isn't aware of it, so if you do something like

    perl -E 'say chr(255)'|hexdump -C 00000000 ff 0a |..|

    the output encoding is Latin-1 (even if the locale is something with UTF-8).

    Note that this changes for characters with codepoint > 255. Those can't be encoded in Latin-1, so UTF-8 is used for the whole string (and you get a "wide character" warning).

      No. The "utf-8 enabled" is a property of the terminal, not of the shell.

      Hm. When I used the term "shell", I (perhaps) wasn't specific enough.

      Please see subthread Re^3: Help with pack error, and then consider how:

      a byte value greater than 127, output from a perl script that makes no attempt to enable utf-anything, that is piped directly to a a process (od) that makes no attempt to perform any conversions or transformations of its input, would see that byte as 2 bytes?

      Ie. Where is the utf-8'ness being applied?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        It shouldn't happen, and i don't know any Perl version where it happens.

        But the example there isn't minimal at all (why load LWP::something? that could do some dire magic) and not runnable for me (what does it read from STDIN?), so it's hard to tell.

        And I'm not sure what happens on windows when you try to write binary data to a text file handle. (Linux doesn't have that distinction, and I don't use windows for programming, so I don't know what the expected outcome is. I remember one unhappy foray into windows programming where I spent several hours debugging a missing "b" in a call to open when reading files).

Re: STandard handles inherited from a utf-8 enabled shell
by tobyink (Canon) on Mar 21, 2012 at 16:39 UTC

    UTF-8 output will "work" but you'll get warnings about printing "wide characters" (even without the warnings pragma enabled).

    binmode(STDOUT, ':utf8') will prevent the warnings.

    I recommend utf8::all which takes care of the above, plus other utf8 gotchas.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Without the binmode, some instances of 127-255 will work and some instances won't.
Re: Standard handles inherited from a utf-8 enabled shell
by ikegami (Patriarch) on Mar 23, 2012 at 00:17 UTC
    Perl never automatically adds the :encoding layer appropriate for your terminal, even if that terminal uses chcp 65001.
      Perl never automatically adds the :encoding layer appropriate for your terminal, even if that terminal uses chcp 65001

      That was never in question. Windows was never in question. To the best of my knowledge the OP of the original problem was using some flavour of *nix.

      The question was whether (under *nix), the standard handles inherited by a perl process from a Unicode enabled terminal (session) might have some influence upon how output from those standard handles, was handled by the OS. Specifically a pipe in this case.

      The answer is no. But I didn't know, so I asked.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        I took a stab at an unclear question. Your clarification still talks about handles inherited from a session or terminal, but handles are inherited from a process.

        The poster of the question is most definitely a Windows user, but the answer I gave is not Windows specific. Perl never automatically adds the :encoding layer appropriate for your terminal.

        Pipes don't even have anything to do with terminals.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://960809]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-03-28 12:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found