|Perl: the Markov chain saw|
Re: i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode'by Juerd (Abbot)
|on Feb 25, 2008 at 02:06 UTC||Need Help??|
How are you READING the UTF-8 data? Outputting is hard to do wrong. Indeed you just set an :encoding or :utf8 layer on the output handle.
However, if you use :utf8 for input, you're in for trouble (malfunction and security bugs). Always use :encoding for text input.
The error message about 0xF8 (which is the Danish ø character, not æ, which is indeed 0xE6) suggests to me that the input is NOT UTF-8, but instead ISO-8859-1 or ISO-8859-15, and the :utf8 was used. Update: I meant :encoding(utf8) here. ":utf8" should of course not be used for input.
If the input is ISO-8859, and the input layer is :utf8, you get lots of errors and you should be happy if any part of your program works correctly. Probably not the case here.
If the input is ISO-8859, and the input layer is :encoding(utf8), you get substitution characters for practically all non-ASCII characters.
The only correct way to read a ISO-8859-15 text file or stream, is to use :encoding(ISO-8859-15). This can be done automatically based on the locale, with "use open", see its documentation. Note that using that is likely to introduce problems for other users, especially those who don't have any locale, but do have a UTF-8 capable terminal. This, however, is not a Perl problem.
If you haven't already done so, please forget everything you've ever read and learned about Perl unicode support, and read perlunitut.