![]() |
|
more useful options | |
PerlMonks |
IO::Handle Unicode and ungetc()by coolmichael (Deacon) |
on Jan 06, 2013 at 05:35 UTC ( #1011831=perlquestion: print w/replies, xml ) | Need Help?? |
coolmichael has asked for the wisdom of the Perl Monks concerning the following question:
I think I've run into a problem with Unicode and IO::Handle. It's very likely I'm doing something wrong. I want to get and unget individual unicode characters (not bytes) from an IO::Handle. But I'm getting a surprising error.
The error message from the ungetc() line is "Malformed UTF-8 character (unexpected end of string) in say at unicode.pl line 21. "\x{00c5}" does not map to utf8 at unicode.pl line 21." But that's the correct hex for the character, and it should map to the character. I used a hex editor to make sure that the bytes for A-RING are correct for UTF-8. This seems to be a problem for any two-byte character. The final say outputs '\xC5' (literally four characters: backslash, x, C, 5) And I've tested this by reading from files instead of scalar variables. The result is the same. This is perl 5, version 16, subversion 2 (v5.16.2) built for darwin-2level Edited to add: And the script is saved in UTF-8. That was the first thing I checked.
Back to
Seekers of Perl Wisdom
|
|