<?xml version="1.0" encoding="windows-1252"?>
<node id="961202" title="Re: &quot;ISO-8859-1 0x80-0xFF&quot; and chr()" created="2012-03-23 08:45:10" updated="2012-03-23 08:45:10">
<type id="11">
note</type>
<author id="616540">
moritz</author>
<data>
<field name="doctext">
&lt;blockquote&gt;1. chr() returns characeter not bytes.(silly me)&lt;/blockquote&gt;

&lt;p&gt;While "bytes" and "characters" is a useful mental image, it's not always correct. The operation defines the context. For example [doc://uc] interprets a string as text no matter what, whereas [doc://print] interprets a string as bytes (if it can)

&lt;p&gt;The real problem is that the byte 0xe9 cannot be decoded as UTF-8, because it isn't UTF-8. Either do nothing with it (which works on sufficiently modern perls), or decode it as Latin-1, because Latin-1 (aka ISO-8859-1) maps each byte exactly to the same codepoint number.&lt;/p&gt;

&lt;p&gt;Note that instead of calling &lt;c&gt;encode()&lt;/c&gt; on each output string, you can also set an IO layer which does it automatically:

&lt;code&gt;
binmode STDOUT, ':encoding(UTF-8)';
&lt;/code&gt;

&lt;p&gt;Or on the command line, you can set that up with the [doc://perlrun|-C] option:

&lt;code&gt;
$ perl -CS -wE 'say chr hex "E9"'
é
&lt;/code&gt;

&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-616540"&gt;
[http://perl6.org/|Perl 6 - second systems done right]
&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
961193</field>
<field name="parent_node">
961193</field>
</data>
</node>
