http://www.perlmonks.org?node_id=1112742


in reply to Re^2: Lost in encoding in Twig
in thread Lost in encoding in Twig

Thank you both.
I do not think I need utf8 as my perl is v5.18 (debian jessie).

Hexdump on my code.pl is correct c3 a9.
Hexdump on my out.xml is incorrect and correspond to my cat : c3 83 c2 a9 (é).
Sorry, I do not know how to check if my "command shell is set to interpret output as ISO Latin". But my locale is : LANG=en_US.UTF-8.
I can use "keep-encoded" to make it work (in this case), but I would like to understand...

Replies are listed 'Best First'.
Re^4: Lost in encoding in Twig
by hippo (Bishop) on Jan 09, 2015 at 13:58 UTC
    I do not think I need utf8 as my perl is v5.18 (debian jessie).

    That's fine (assuming you are correct). If you are relying on new features such as this in your code, it is probably a good idea to specify that minimum version of perl at the top so that this will trap any otherwise confusing errors should you (or someone else) try to run it on an older perl. eg:

    use 5.018;
    I can use "keep-encoded" to make it work (in this case), but I would like to understand...

    In a nutshell, to get unicode (and more specifically utf8) to work properly in perl you have to decode your inputs and encode your outputs. In the script here, you have no inputs to worry about because the data is hardcoded in your source. The output however does matter because you are printing this data to a file and so have to encode it first. You can do that manually, but it's generally considered a better idea to open the file with the :utf8 layer via binmode (or directly with open) so that it all gets encoded as it passes out through the I/O system. See also PerlIO::encoding.

      Thank you,
      1- are you trying to tell me that utf8 is a "new" feature ? in 2014 ? sorry, in 2015 ;) .

      2- do you want to tell that for each operation in perl, and particularly for IO, I MUST specify utf8?
      Is there today another coding than utf8?

      3- Is not it possible to specify that all is utf8 (unless exceptions) at computer level ? or at least at perl level, or at least at script level ?

        1. No, quite the opposite in fact.
        2. No, you can do so more generally (see point 3). Yes, there are many other encodings today than utf8. Thank your lucky stars you are not forced to deal with MSWin32 encodings. :) There are also a plethora of data available from the past half century much of which utilises encodings other than utf8.
        3. use open qw(:utf8); or export PERL_UNICODE=S for example