http://www.perlmonks.org?node_id=801719

pcouderc has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl -w use warnings;
use strict; use Encode; my $lineo ='azertyuiop'; my $line = encode("utf8",$lineo); open(TEK,">:utf8","temp.tex") || die("Cannot Open File"); print TEK $line; print TEK <<EOF; 123456<<<<< EOF close TEK;
gives me
0000000 a z e r t y u i o p 1 2 3 4 5 +6 61 7a 65 72 74 79 75 69 6f 70 31 32 33 34 35 36 0000020 < < < < < nl 3c 3c 3c 3c 3c 0a
and all my other trials so... How to get utf8 in output? Thank you , o wise emonks..

Replies are listed 'Best First'.
Re: lost in utf8
by almut (Canon) on Oct 17, 2009 at 07:57 UTC

    What do you expect? This is UTF-8. Any difference is hard to see, though, as you're only printing ASCII characters... Try printing some non-ASCII characters, and you'll see those characters being encoded as multi-byte sequences.

    (See UTF-8 and note that the range U+0000–U+007F (ASCII) encodes as single bytes.)

      Mmm, yes, thank you, my misundertanding of UTF8...
Re: lost in utf8
by ikegami (Patriarch) on Oct 20, 2009 at 01:05 UTC
    By the way, you're double encoding (bad!)
    my $line = encode("utf8",$lineo); open(TEK,">:utf8","temp.tex"); print TEK $line;
    should be
    my $line = encode("utf8",$lineo); open(TEK,">","temp.tex"); print TEK $line;
    or
    my $line = $lineo; open(TEK,">:utf8","temp.tex"); print TEK $line;
      Doesn't the second block of code require a binmode() just for compatibility?

        binmode basically disables :crlf and :encoding.

        Since neither were used in the second snippet, we're talking about the default layers from the OS, from $ENV{PERLIO} or from use of the open pragma.

        While using binmode to remove any default :encoding may be smart, it may accidentally disable the :crlf layer. There's no indication that we want the :crlf status to be different, so it's better to use the last snippet if you're dealing with a text file.

        Now, if you're dealing with a binary file, you'd use binmode, and you'd use encode for the text bits. Any LF<->CRLF conversion will have to be handled manually.