in reply to i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode'
How are you READING the UTF-8 data? Outputting is hard to do wrong. Indeed you just set an :encoding or :utf8 layer on the output handle.
However, if you use :utf8 for input, you're in for trouble (malfunction and security bugs). Always use :encoding for text input.
The error message about 0xF8 (which is the Danish ø character, not æ, which is indeed 0xE6) suggests to me that the input is NOT UTF-8, but instead ISO-8859-1 or ISO-8859-15, and the :utf8 was used. Update: I meant :encoding(utf8) here. ":utf8" should of course not be used for input.
If the input is ISO-8859, and the input layer is :utf8, you get lots of errors and you should be happy if any part of your program works correctly. Probably not the case here.
If the input is ISO-8859, and the input layer is :encoding(utf8), you get substitution characters for practically all non-ASCII characters.
The only correct way to read a ISO-8859-15 text file or stream, is to use :encoding(ISO-8859-15). This can be done automatically based on the locale, with "use open", see its documentation. Note that using that is likely to introduce problems for other users, especially those who don't have any locale, but do have a UTF-8 capable terminal. This, however, is not a Perl problem.
If you haven't already done so, please forget everything you've ever read and learned about Perl unicode support, and read perlunitut.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode'
by bcrowell2 (Friar) on Feb 25, 2008 at 02:29 UTC | |
by Juerd (Abbot) on Feb 25, 2008 at 10:23 UTC | |
Re^2: i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode'
by bcrowell2 (Friar) on Feb 25, 2008 at 04:43 UTC | |
by Juerd (Abbot) on Feb 25, 2008 at 15:04 UTC | |
by ikegami (Patriarch) on Feb 25, 2008 at 17:40 UTC | |
Re^2: i18n/utf8 problem, 'utf8 "\xF8" does not map to Unicode'
by shagbark (Acolyte) on Oct 22, 2014 at 01:31 UTC |