http://www.perlmonks.org?node_id=742791


in reply to Re^3: Character encoding of microns
in thread Character encoding of microns

am i also correct in assuming that perl automatically writes data as ISO-8859-1?

Not really. Perl outputs using whatever encoding you specify (via use open, binmode or some other means).

If you don't specify, it outputs the internal representation of the string which is either arbitrary bytes of unknown encoding (UTF8 flag off) or a lax variant of UTF-8 called utf8 (UTF8 flag on). If the UTF8 flag is on, you might also get a warning.

If you happen to pass iso-latin-1 characters to Perl and you print these out, Perl will output iso-latin-1. But the same goes for any encoding.

# U+00E9 LATIN SMALL LETTER E WITH ACUTE # Second perl outputs iso-8859-1 $ perl -e'use open ":std", ":encoding(iso-8859-1)"; print chr(0x00E9)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001 # U+0449 CYRILLIC SMALL LETTER SHCHA # Second perl outputs iso-8859-5 $ perl -e'use open ":std", ":encoding(iso-8859-5)"; print chr(0x0449)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001

However, many aspects of Perl will presume the arbitrary bytes of unknown encoding are iso-latin-1. This includes uc, regexp character classes such as \w, explicit upgrades to utf8 (utf8::upgrade($_)), and implicit upgrades to utf8 (chop( $_ . chr(0x2660) )).