Unicode Korean problemby Anonymous Monk
|on Jul 28, 2005 at 03:01 UTC||Need Help??|
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Basic problem: perl 5.8 seems to refuse to decode Korean UTF-8 correctly.
I have an e-mail sending program that reads UTF-8 Korean (and Japanese) from a database and then formats it to an e-mail. I already have this routine working well for iso-8859.
I thought all I would have to do is change the MIME tags to utf-8 and have it print out the raw utf-8 characters, but perl 5.8.* is complaining I have a Wide character as part of a function call when I call encode_qp (for converting the subject line to quoted printed format according to RFC2047 standards).. The program then dies. I tried to follow the recommendations of 'man perlunicode' and converted the database strings to utf-8 flagged status using:
This resulted in a blank string.. When I changed it to use:
I also went to extra step of verifying the first 3 bytes of the subject line was a valid code.. The UTF-8 sequence was "EC A0 9C" which converts to C81C in Unicode, which is a valid codepoint.
I read further into a 'README.perl' in the lib/perl5/5.8.*/unicore area that mentioned downloading a couple of large files (Unihan.txt and NormalizeTesting.txt), which I did, and followed the one step of 'perl mktables -makelist'... This build process seemed to work but it still complains about the invalid translations..
Is there more that I need to do to get a successful utf8 decode?
Thanks much in advance.