Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^4: Character encoding of microns

by ikegami (Patriarch)
on Feb 10, 2009 at 15:37 UTC ( [id://742791]=note: print w/replies, xml ) Need Help??

in reply to Re^3: Character encoding of microns
in thread Character encoding of microns

am i also correct in assuming that perl automatically writes data as ISO-8859-1?

Not really. Perl outputs using whatever encoding you specify (via use open, binmode or some other means).

If you don't specify, it outputs the internal representation of the string which is either arbitrary bytes of unknown encoding (UTF8 flag off) or a lax variant of UTF-8 called utf8 (UTF8 flag on). If the UTF8 flag is on, you might also get a warning.

If you happen to pass iso-latin-1 characters to Perl and you print these out, Perl will output iso-latin-1. But the same goes for any encoding.

# U+00E9 LATIN SMALL LETTER E WITH ACUTE # Second perl outputs iso-8859-1 $ perl -e'use open ":std", ":encoding(iso-8859-1)"; print chr(0x00E9)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001 # U+0449 CYRILLIC SMALL LETTER SHCHA # Second perl outputs iso-8859-5 $ perl -e'use open ":std", ":encoding(iso-8859-5)"; print chr(0x0449)' + | perl -e"print <>" | od -t x1 0000000 e9 0000001

However, many aspects of Perl will presume the arbitrary bytes of unknown encoding are iso-latin-1. This includes uc, regexp character classes such as \w, explicit upgrades to utf8 (utf8::upgrade($_)), and implicit upgrades to utf8 (chop( $_ . chr(0x2660) )).

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://742791]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-05-24 11:33 GMT
Find Nodes?
    Voting Booth?

    No recent polls found