http://www.perlmonks.org?node_id=871057


in reply to text encodings and perl

Thanks for sharing your thoughts. I have a small nit however:

If the string is in "internal form" then perl attempts to find "characters" in it. Otherwise, perl simply works with "octets".

This is true for the length function, but most often it's not. For functions like uc and print it's the operation that sets the context.

Just like concatenation imposes string context, and multiplication numeric context, print imposes "binary" context (and encodes and warns if necessary), and uc imposes "character" context (and decoes with Latin-1 if the string holds undecoded octests).

(self promotion: I've written a similar document on encodings and Unicode in Perl, though a bit longer. I hope you find it useful).