http://www.perlmonks.org?node_id=1018091

na has asked for the wisdom of the Perl Monks concerning the following question:

I have a code like this
#!/usr/bin/perl use warnings; use strict; use utf8; binmode( STDERR, ':utf8' ); $ENV{ LANG } = 'C'; warn "<UTF-8 string>"; open( my $in, '<', 'non-existing-file' ) || die $!;
Because "$!" is just a byte string, output is terrible. My questions are:

1) Are there any way to enfoce '$!' to 'C' locale. "$ENV{ LANG } = 'C';" does,'t works.

2) Are there any good solution? Some my poor workarounds are:

My perl version: v5.14.2

Replies are listed 'Best First'.
Re: How to handle encoding for STDERR
by andal (Hermit) on Feb 11, 2013 at 08:01 UTC

    Well, your question is very hard to understand. What exactly is your problem? Read perllocale. It states

    By default, Perl ignores the current locale. The "use locale" pragma tells Perl to use the current locale for some operations

    So, unless you say "use locale" your locale settings are ignored by perl program. But they are not ignored by the shell that was used to execute perl program. So, if the shell is configured to receive UTF-8 text from programs, then your perl program should produce it, otherwise you get garbage to see.

    Now, you get garbage. First, you should figure out, what is the source for the garbage. You have code 'warn "<UTF-8 string>"', do I assume correctly, that in place of "<UTF-8 string>" you do have some text with UTF-8 characters? Is this string shows up correctly?

    If only $! shows up as garbage, have you tried to check if it contains octets or has utf8 flag set? As far as I understand it, binmode configures filehandle to convert all data from internal encoding (marked by presence of utf8 flag) to the sequence of octets in appropriate encoding. So, if the data is already sequence of octets, then the additional conversion will mess up the data.

    Personally, I avoid using binmode for setting UTF-8 handling. I just follow simple rule: output only octets in appropriate encoding (normally it is UTF-8). Then I just use Encode::decode or Encode::encode to convert octets to strings as perl understands them, or back from perl strings to octets for output.

    If there's 'use utf8', then any strings directly provided in the script will be converted to internal format understandable by perl, so those will have to be converted to octets before they are passed outside of perl program.

    I've never seen $! containing non-english text because most of the time the systems I work with don't have anything but English stuff, so I don't know in which form is the text there. But if it is just sequence of octets, then you'll have problem outputting it through file handle expecting perl string and not sequence of octets.

      Sorry for poor writing.

      Because correct encoded string in my locale may looks like 'garbage' for most of you, I avoid to cut & paste exact script and output.

      As you guess, "<UTF-8 string>" is a utf-8 encoded string in Japanese and output fine( because of combination of "use 'utf8'" and "binmode(STDERR , ':utf8')." It may depend OS and locale, but for "ja_JP.UTF-8" locate on Ubuntu case, Perl generate language specific error messages.

      Half of may question was ans wed by Anonymous Monk.

      I just want to know how to get error string in C-locale even if Perl-process start in non-C locale.

Re: How to handle encoding for STDERR
by Anonymous Monk on Feb 11, 2013 at 05:05 UTC

    Because "$!" is just a byte string, output is terrible.

    What does that mean?

    I imagine you could you'd use POSIX/or locale for $ENV{LANG}='C' to have effect at runtime

        Thank you for the information. I shold search by myself first!
Re: How to handle encoding for STDERR
by ww (Archbishop) on Feb 11, 2013 at 13:51 UTC
    Whatever other problems may exist, your open (Ln 9) comes up a little short of a dozen; a few degrees off plumb; or, more directly, wrong:

    Your open( my $in, '<', 'non-existing-file' ) || die $!; should be

    open( my $in, '<', 'path-to-file-name' ) or die "Can't open path-to-f +ile-name, $!";

    (with a [now deprecated, at least by some] bareword and variant quote symbols)

    my $datafile; open (OUT, ">", $datafile ) or die "Can't open $datafile for write, $! +";
    # for a write
Re: How to handle encoding for STDERR
by Anonymous Monk on Feb 11, 2013 at 13:07 UTC
    I am suspicious of a bug, or possibly of bad-data from wherever this stuff is coming from. Don't try to "code around it" ... find the true root cause of what's going on here. Red flags abound, demanding explanation.