http://www.perlmonks.org?node_id=478800

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Basic problem: perl 5.8 seems to refuse to decode Korean UTF-8 correctly.

I have an e-mail sending program that reads UTF-8 Korean (and Japanese) from a database and then formats it to an e-mail. I already have this routine working well for iso-8859.

I thought all I would have to do is change the MIME tags to utf-8 and have it print out the raw utf-8 characters, but perl 5.8.* is complaining I have a Wide character as part of a function call when I call encode_qp (for converting the subject line to quoted printed format according to RFC2047 standards).. The program then dies. I tried to follow the recommendations of 'man perlunicode' and converted the database strings to utf-8 flagged status using:

$subjecttxt = Encode::decode_utf8($subjecttxt); $encodedsubject = encode_qp($subjecttxt);

This resulted in a blank string.. When I changed it to use:
encode("utf8",$subjecttxt,Encode::FB_CROAK)
and it told me it couldn't convert the utf8.. thinking it was invalid.. I verified it was valid and was even able to view it correctly in Linux (with LANG=en_US.utf-8 setting).

I also went to extra step of verifying the first 3 bytes of the subject line was a valid code.. The UTF-8 sequence was "EC A0 9C" which converts to C81C in Unicode, which is a valid codepoint.

I read further into a 'README.perl' in the lib/perl5/5.8.*/unicore area that mentioned downloading a couple of large files (Unihan.txt and NormalizeTesting.txt), which I did, and followed the one step of 'perl mktables -makelist'... This build process seemed to work but it still complains about the invalid translations..

Is there more that I need to do to get a successful utf8 decode?
Is there a workaround way I could pass the raw utf8 directly to encode_qp() function without it complaining?

Thanks much in advance.