in reply to Re: How to sanely handle unicode in perl?
in thread How to sanely handle unicode in perl?

This does also not solve my problem. I want perl to respect the locale of the user calling that script.

If I use your "open" statement and run the script in an iso8859-1 terminal, i get the following:

karoshi:~>LC_CTYPE=de_DE.ISO-8859-1 ./u8demo.pl I read a line, that is 1 chars long. That line is: ö That line in ascii is: o
which is clearly incorrect.

Replies are listed 'Best First'.
Re^3: How to sanely handle unicode in perl?
by Your Mother (Archbishop) on Mar 20, 2015 at 16:50 UTC

    See point 14 in Assume Brokeness of the link I gave — “Code that assumes Unicode gives a fig about POSIX locales is broken.”

      I do not assume unicode. I just want to handle data correctly. perl is apparently unable to output data in the way it's environment requires it to.

      The frustrating part is that perl looks like it is equipped to work. It is _able_ to do output conversion on the fly. It is just not able to do it correctly without user intervention.

        \xc3\xb6 is not the right byte(s) for an ö from a Latin-1 terminal, it is the UTF-8 encoding. Meaning it can only be issued by a UTF-8 encoded source (and still mean ö). So what you are asking to do sanely, strikes me as…strange. If it is coming from a Latin-1 encoding source it would be \xf6. To do encoding properly you have to know what you are receiving, decode it with that, and know what your output layer is, encode it to that. It’s not easy but it’s not magical either. Without the right steps at the right layers it’s literally guesswork and impossible to do robustly.

        I do not assume unicde.
        I think you misparsed that sentence
        “Code that assumes Unicode gives a fig about POSIX locales is broken.”
        This is not
        (Code that assumes Unicode) gives a fig about POSIX locales is broken.
        but
        Code that assumes (Unicode gives a fig about POSIX locales) is broken.
        Update: perhaps I should point out that we seem to share the same native language