in reply to How to reverse a (Unicode) string
print scalar reverse "\noäu";
If you entered this using an UTF-8 editor, you forgot to "use utf8;" to notify Perl of this fact.
You may be dealing with the string "\no\x{C3}\x{A4}u" instead of the intended "\no\x{e4}u"!
reverse Works on bytes
reverse works on characters. If you have a bytestring, every character represents the equivalent byte. If you have a Unicode text string, reverse properly reverses based on unicode codepoints.
You can solve this problem by decoding the text strings
This suggests that decoding is a workaround. It is not, it is something you should always do when dealing with text data!
The use utf8; takes care that every string literal in the script is treated as a text string
Perl has no idea, and cannot be told, what kind your strings are: binary or text. Without "use utf8" you don't necessarily have byte strings, but if you have text strings, they're interpreted as iso-8859-1 rather than utf-8. Note that iso-8859-1 is a unicode encoding -- it just doesn't support all of the characters.
The rest of your post is accurate, but I wanted to respond to avoid that newbies get a negative feeling about Perl's unicode support from your post. Perl's unicode support is great, but the programmer MUST learn the difference between unicode and utf-8, and the difference between text data and binary data.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: How to reverse a (Unicode) string
by moritz (Cardinal) on Jan 09, 2008 at 22:27 UTC | |
by Juerd (Abbot) on Jan 10, 2008 at 00:53 UTC | |
by moritz (Cardinal) on Jan 10, 2008 at 08:33 UTC | |
by Juerd (Abbot) on Jan 10, 2008 at 21:42 UTC | |
by ikegami (Patriarch) on Jan 09, 2011 at 23:52 UTC | |
by JavaFan (Canon) on Jan 10, 2011 at 09:17 UTC | |
by ikegami (Patriarch) on Jan 10, 2011 at 15:50 UTC | |
| |
by ikegami (Patriarch) on Jan 10, 2011 at 16:15 UTC | |
|