Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

utf8 output with 5.8

by axelrose (Scribe)
on Oct 29, 2002 at 18:25 UTC ( [id://208811] : perlquestion . print w/replies, xml ) Need Help??

axelrose has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I just stumbled other a difference between 5.6 and 5.8.
My goal is to output utf8 characters, e.g. using "ö"

with 5.6 I get:
perl -e '$c="f6"; $u=pack("U",hex($c)); print $u, "\x{f6}"' | od -tx1 0000000 c3 b6 c3 b6 0000004

With 5.8 the output is not the "c3 b6" two byte code for each "ö" but the single byte "f6".

What's the equivalent in 5.8? (sorry, I cannot test myself in the moment)

Thanks for your help, Axel.

Replies are listed 'Best First'.
Re: utf8 output with 5.8
by Thelonius (Priest) on Oct 29, 2002 at 19:37 UTC
    Have you tried:
    use utf8; or, on the command line, perl -Mutf8 -e '$c="f6"; $u=pack("U",hex($c)); print $u, "\x{f6}"' | o +d -tx1
Re: utf8 output with 5.8
by Arrowhead (Monk) on Oct 29, 2002 at 20:55 UTC

    In 5.8 you can (and should) specify which encoding perl should use when writing.
    By default perl5.8 will assume you want latin1 output (but warn and then output in utf8 those strings that do not fit in latin1).

    If you switch your environment to utf8 with something like

    env LC_CTYPE=en_US.UTF-8 perl -e ...
    then you should get the output you expected.

    A different way to do it (TIMTOWTDI) is to explicitly switch STDOUT to a different encoding using the newly repurposed binmode():

    binmode STDOUT, ':utf8';

    A third way is using the new use open pragma:

    use open ':utf8', ':std';

    So to get your one-liner to work, it becomes:

    perl -Mopen=:utf8,:std -e '$c="f6"; $u=pack("U",hex($c)); print $u, "\ +x{f6}"' | od -tx1

    I'm sure you'll be able to find your way to the documentation for these new perl5.8 features. A good start is the perluniintro manpage.

      Thanks for your good explanation!
      I'm developing mainly under MacPerl and need to ask others to check 5.8.

      One question still:
      How can I use those new options in a compatible way so 5.6 scripts don't choke?
Re: utf8 output with 5.8
by axelrose (Scribe) on Oct 30, 2002 at 22:21 UTC
    Within "perluniintro.pod" I found this:

    Note that \x.. (no {} and only two hexadecimal digits), \x{...}, and
    chr(...) for arguments less than 0x100 (decimal 256) generate an
    eight-bit character for backward compatibility with older Perls.  For
    arguments of 0x100 or more, Unicode characters are always produced. If
    you want to force the production of Unicode characters regardless of the
    numeric value, use pack("U", ...) instead of \x.., \x{...}, or chr().

    This explains why "\x{f6}" won't produce utf8 output.
    Nonetheless the pack( "U", hex( "f6" ) ) I used originally should spit out a two byte sequence, shouldn't it?

      to answer myself: it depends on the LANG environment
      [rose@localhost rose]$ export LANG=C [rose@localhost rose]$ perl -e 'print "\x{f6}"' | od -tx1 0000000 f6 0000001 [rose@localhost rose]$ perl -e 'print pack "U", 0xf6 ' | od -tx1 0000000 f6 0000001 [rose@localhost rose]$ export LANG=de_DE.UTF-8 [rose@localhost rose]$ perl -e 'print pack "U", 0xf6 ' | od -tx1 0000000 c3 b6 0000002 [rose@localhost rose]$ perl -e 'print "\x{f6}"' | od -tx1 0000000 c3 b6 0000002