Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Basic problem: perl 5.8 seems to refuse to decode Korean UTF-8 correctly.

I have an e-mail sending program that reads UTF-8 Korean (and Japanese) from a database and then formats it to an e-mail. I already have this routine working well for iso-8859.

I thought all I would have to do is change the MIME tags to utf-8 and have it print out the raw utf-8 characters, but perl 5.8.* is complaining I have a Wide character as part of a function call when I call encode_qp (for converting the subject line to quoted printed format according to RFC2047 standards).. The program then dies. I tried to follow the recommendations of 'man perlunicode' and converted the database strings to utf-8 flagged status using:

$subjecttxt = Encode::decode_utf8($subjecttxt); $encodedsubject = encode_qp($subjecttxt);

This resulted in a blank string.. When I changed it to use:
encode("utf8",$subjecttxt,Encode::FB_CROAK)
and it told me it couldn't convert the utf8.. thinking it was invalid.. I verified it was valid and was even able to view it correctly in Linux (with LANG=en_US.utf-8 setting).

I also went to extra step of verifying the first 3 bytes of the subject line was a valid code.. The UTF-8 sequence was "EC A0 9C" which converts to C81C in Unicode, which is a valid codepoint.

I read further into a 'README.perl' in the lib/perl5/5.8.*/unicore area that mentioned downloading a couple of large files (Unihan.txt and NormalizeTesting.txt), which I did, and followed the one step of 'perl mktables -makelist'... This build process seemed to work but it still complains about the invalid translations..

Is there more that I need to do to get a successful utf8 decode?
Is there a workaround way I could pass the raw utf8 directly to encode_qp() function without it complaining?

Thanks much in advance.


In reply to Unicode Korean problem by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (4)
    As of 2021-01-24 06:24 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Notices?