|
|
| Perl: the Markov chain saw | |
| PerlMonks |
Re: "ISO-8859-1 0x80-0xFF" and chr()by moritz (Cardinal) |
| on Mar 23, 2012 at 12:45 UTC ( #961202=note: print w/ replies, xml ) | Need Help?? |
|
1. chr() returns characeter not bytes.(silly me) While "bytes" and "characters" is a useful mental image, it's not always correct. The operation defines the context. For example uc interprets a string as text no matter what, whereas print interprets a string as bytes (if it can) The real problem is that the byte 0xe9 cannot be decoded as UTF-8, because it isn't UTF-8. Either do nothing with it (which works on sufficiently modern perls), or decode it as Latin-1, because Latin-1 (aka ISO-8859-1) maps each byte exactly to the same codepoint number. Note that instead of calling encode() on each output string, you can also set an IO layer which does it automatically:
Or on the command line, you can set that up with the -C option:
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||