Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: How to handle encoding for STDERR

by andal (Hermit)
on Feb 11, 2013 at 08:01 UTC ( #1018105=note: print w/replies, xml ) Need Help??

in reply to How to handle encoding for STDERR

Well, your question is very hard to understand. What exactly is your problem? Read perllocale. It states

By default, Perl ignores the current locale. The "use locale" pragma tells Perl to use the current locale for some operations

So, unless you say "use locale" your locale settings are ignored by perl program. But they are not ignored by the shell that was used to execute perl program. So, if the shell is configured to receive UTF-8 text from programs, then your perl program should produce it, otherwise you get garbage to see.

Now, you get garbage. First, you should figure out, what is the source for the garbage. You have code 'warn "<UTF-8 string>"', do I assume correctly, that in place of "<UTF-8 string>" you do have some text with UTF-8 characters? Is this string shows up correctly?

If only $! shows up as garbage, have you tried to check if it contains octets or has utf8 flag set? As far as I understand it, binmode configures filehandle to convert all data from internal encoding (marked by presence of utf8 flag) to the sequence of octets in appropriate encoding. So, if the data is already sequence of octets, then the additional conversion will mess up the data.

Personally, I avoid using binmode for setting UTF-8 handling. I just follow simple rule: output only octets in appropriate encoding (normally it is UTF-8). Then I just use Encode::decode or Encode::encode to convert octets to strings as perl understands them, or back from perl strings to octets for output.

If there's 'use utf8', then any strings directly provided in the script will be converted to internal format understandable by perl, so those will have to be converted to octets before they are passed outside of perl program.

I've never seen $! containing non-english text because most of the time the systems I work with don't have anything but English stuff, so I don't know in which form is the text there. But if it is just sequence of octets, then you'll have problem outputting it through file handle expecting perl string and not sequence of octets.

Replies are listed 'Best First'.
Re^2: How to handle encoding for STDERR
by na (Novice) on Feb 12, 2013 at 13:52 UTC

    Sorry for poor writing.

    Because correct encoded string in my locale may looks like 'garbage' for most of you, I avoid to cut & paste exact script and output.

    As you guess, "<UTF-8 string>" is a utf-8 encoded string in Japanese and output fine( because of combination of "use 'utf8'" and "binmode(STDERR , ':utf8')." It may depend OS and locale, but for "ja_JP.UTF-8" locate on Ubuntu case, Perl generate language specific error messages.

    Half of may question was ans wed by Anonymous Monk.

    I just want to know how to get error string in C-locale even if Perl-process start in non-C locale.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018105]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2018-05-20 12:24 GMT
Find Nodes?
    Voting Booth?