"ISO-8859-1 0x80-0xFF" and chr()

remiah has asked for the wisdom of the Perl Monks concerning the following question:

I lived my life decoding input bytes, makes it as a character, and print encoding it as bytes. My life was as belows.

#!/usr/bin/perl
use strict;
use warnings;
use Encode qw/encode decode/;

my ($byte,$decoded);
#get bytes
$byte=`perl -MEncode -e "print encode('UTF-8',chr(hex('00E9')))"`;

#decode bytes to char
$decoded=decode('UTF-8', $byte);

#encode char to byte for print
print encode('UTF-8', $decoded)
[download]

This prints "é". Yesterday I stumbled with ISO-8859-1 0x80-0xFF problem. Code below prints "�" (replacement characer). This confused me.

#!/usr/bin/perl
use strict;
use warnings;
use Encode qw/encode decode/;

my ($byte,$decoded);
#$byte=chr(hex('0041'));
$byte=chr(hex('00E9'));

#decode bytes to char
$decoded=decode('UTF-8', $byte);

#encode char to byte for print
print encode('UTF-8', $decoded);
[download]

There were two thing that I didn't understand and confuesed.
1. chr() returns characeter not bytes.(silly me)
2. There needs some care for "ISO-8859-1 0x80-0xFF" characters.

I have to "upgrade" the results of chr() when the character is "ISO-8859-1 0x80-0xFF". So, If I want to go back to ordinary life of decode and encode, I have to do like this.

#!/usr/bin/perl
use strict;
use warnings;
use Encode qw/encode decode/;

my ($chr,$decoded);
$chr=chr(hex('00E9'));

#it is already not bytes but character.
#use utf8::upgrade from "native encoding" to "UTF-8 encoding"(perl int
+ernal)
utf8::upgrade($chr);

#now you can encode char to byte for print
print encode('UTF-8', $chr);
[download]

This prints "é". I have read http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8 and this page told me "dbd drivers must be clever than me". It may be true if I am clever enough to ask them properly for decoding...

I kick perl to get bytes of "é". But I guess there must be more elegant way to get bytes. Tomorrow, I should refer pack(). Good night.

Back to Seekers of Perl Wisdom