Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^5: Standard handles inherited from a utf-8 enabled shell

by moritz (Cardinal)
on Mar 21, 2012 at 19:02 UTC ( #960847=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Standard handles inherited from a utf-8 enabled shell
in thread Standard handles inherited from a utf-8 enabled shell

I speculate that if you print bytes with the high-bit set, from a no-utf-enabled instance of perl, run from a utf-enabled shell, this situation can arise. Regardless of the OS you happen to be running on.

I won't believe this until I've seen it, reproduced as a minimal example (disregarding things like shell alises that add command line options, PERL5OPT, PERLIO or PERL_UNICODE environment variables).


Comment on Re^5: Standard handles inherited from a utf-8 enabled shell
Re^6: Standard handles inherited from a utf-8 enabled shell
by BrowserUk (Pope) on Mar 21, 2012 at 19:24 UTC
    I won't believe this until I've seen it, reproduced as a minimal example

    As I said: agreed. But can you think of anything else that might fit with the symptoms described and the apparent solution?

    I couldn't, and all my attempts to try and re-create the situation also failed:

    perl -CO -e" system q[ \perl64\bin\perl.exe -e\" print pack 'B8', '111 +11111'; \" | od -t x1 ]" 0000000 ff 0000001 perl -CO -e" system q[\perl64\bin\perl.exe -CO -e\"print pack 'B8', '1 +1111111'; \" | od -t x1 ]" 0000000 c3 bf 0000002

    I would have expected the first of those to produce the same od output as the second, had the second instance of perl inherited the stdout characteristics of its parent.

    But I'm on windows, and disproving the possibility here, doesn't disprove it for other platforms, hence my asking the question.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      That's not how I see it. I see the system-ed perl as an autonomous process (unknowing of its parent process) with its STDOUT filehandle set with different encodings.

      In both cases, we're printing out a string with one character at codepoint U+00FF.

      The second system-ed perl has its output encoding set to UTF-8 (via -CO). What octets do we send out into the cruel world for U+00FF character encoded in UTF-8? Ans: c3 bf.

      The first system-ed perl has its output "set" to byte/Latin-1 encoding (the default). What octets do we send out into the cruel world for U+00FF character encoded in Latin-1? Ans: ff.

      The first case did not print c3 bf just because of the parent perl -CO because the system print did not go through the parent's perlio.
        I see the system-ed perl as an autonomous process (unknowing of its parent process)

        As you are probably aware, system is equivalent to fork followed by exec.

        You are also probably aware tha fork preserves open file descriptors. This is why to create a daemon, it is necessary to fork twice. You fork once, close the standard handles in the child; and then fork a second time. Only then does the second child become disassociated with the terminal and a true daemon.

        What you may not be familiar with is that (various forms of) exec are front end for execve. And that execve() also preserves open file descriptors. (Except those marked close-on-exec.)

        To quote the above man page:By default, file descriptors remain open across an execve().

        You can prove this to yourself. Run this one-liner (suitably adjusted):

        perl -e"system qq[ $^X -e\"\$n=123; print \$n\" ];" 123

        And you'll see the output 123

        Now try this modified version:

        C:\test>perl -e"close STDOUT; system qq[ $^X -e\"$n=123; print \$n\" ] +;"

        Where did the output disappear to?

        So bang goes the autonomous process theory.

        In both cases, we're printing out a string with one character at codepoint U+00FF.

        No. The return value from pack 'B8', ... is not a character; nor a codepoint; and absolutely nothing to do with Unicode.

        It is a byte! An 8-bit unsigned number bit pattern stored in a 8-bit unit of memory and nothing else.

        No interpretation of the meaning (nor even signedness) is placed (nor could be) upon that value until you do something with it!

        The second system-ed perl has its output set ...

        You're right that the interpretation applied to the 8-bit value is not preserved across the fork/exec pair, but not because of your reasoning.

        The important part is that the OS cannot preserve what it has no knowledge of. There is no concept of encoding attached to the file descriptors.

        It is also likely, though I haven't confirmed this, that Perl reopens the standard handles when it starts.

        The bottom line -- for this thread, rather than this subthread -- is that the OP must have omitted some details from his scenario.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://960847]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (11)
As of 2014-07-31 09:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (248 votes), past polls