Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Lost in encoding in Twig

by hippo (Chancellor)
on Jan 09, 2015 at 12:17 UTC ( #1112729=note: print w/replies, xml ) Need Help??


in reply to Lost in encoding in Twig

Ah, unicode. Lucky you!

If you have not done so already, I recommend reading perlunitut as an introduction to the subject - lots covered there. If you are in a rush to solve this particular problem, try use utf8 (because of your use of literals in the source code) and binmode (for your file-based output).

Replies are listed 'Best First'.
Re^2: Lost in encoding in Twig
by Anonymous Monk on Jan 09, 2015 at 13:12 UTC

    The recommendations by pcouderc are the usual ones, and I would actually code the script that way. But the fact is that when I run your script as originally posted and cat the output I get '<myxml fille="clémence"/>'. Have you looked at xml.out with 'hexdump -C'? If you see 'clémence' encoded as '63 6c c3 a9 6d 65 6e 63 65' it means your Perl output is correct. But if your command shell is set to interpret output as ISO Latin-1 you would get 'clémence' even if your script is doing the right thing.

      Thank you both.
      I do not think I need utf8 as my perl is v5.18 (debian jessie).

      Hexdump on my code.pl is correct c3 a9.
      Hexdump on my out.xml is incorrect and correspond to my cat : c3 83 c2 a9 (é).
      Sorry, I do not know how to check if my "command shell is set to interpret output as ISO Latin". But my locale is : LANG=en_US.UTF-8.
      I can use "keep-encoded" to make it work (in this case), but I would like to understand...
        I do not think I need utf8 as my perl is v5.18 (debian jessie).

        That's fine (assuming you are correct). If you are relying on new features such as this in your code, it is probably a good idea to specify that minimum version of perl at the top so that this will trap any otherwise confusing errors should you (or someone else) try to run it on an older perl. eg:

        use 5.018;
        I can use "keep-encoded" to make it work (in this case), but I would like to understand...

        In a nutshell, to get unicode (and more specifically utf8) to work properly in perl you have to decode your inputs and encode your outputs. In the script here, you have no inputs to worry about because the data is hardcoded in your source. The output however does matter because you are printing this data to a file and so have to encode it first. You can do that manually, but it's generally considered a better idea to open the file with the :utf8 layer via binmode (or directly with open) so that it all gets encoded as it passes out through the I/O system. See also PerlIO::encoding.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1112729]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2020-12-04 10:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (58 votes). Check out past polls.

    Notices?