Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Quoted Printable to Unicode or something

by aitap (Curate)
on Oct 25, 2013 at 18:36 UTC ( [id://1059730]=note: print w/replies, xml ) Need Help??


in reply to Quoted Printable to Unicode or something

Is there any charset of the message text specified? For example, Content-Type: text/plain; charset="utf-8". The problem is that you have arbitrary bytes encoded as text while you really need unicode characters, not bytes.

If the encoding is specified, use decode to decode these bytes into characters after decoding them from quoted-printable encoding. If not, you'll have to guess is somehow (I tried utf-8, utf-16(le|be), shift-jis and failed to obtain any sense from the resulting characters).

Replies are listed 'Best First'.
Re^2: Quoted Printable to Unicode or something
by rethaew (Sexton) on Oct 25, 2013 at 19:41 UTC
    Yes Content-Type: text/plain; charset=utf-8 is specified in the message header. Also I apologize I typo-ed in the op, the code I meant to state was:
    =F0=9F=98=B3
    Which would be part of the message body, e.g.
    So I saw Kevin today and he is sooo cute =F0=9F=98=B3
    Where in the original message, this would be a smily face emoji. I am a little unclear on using the decode. Are you saying just to decode the '=F0=9F=98=B3' for the entire message? Can you give an example?

      I tried to decode a MIME-encoded message with MIME::Tools, and got the body decoded from quoted-printable to bytes and accessible via MIME::Body methods. To get the unicode characters I needed to do one more decoding step and decode my message body from bytes to characters using Encode module.

      Approaching your example,

      use MIME::Decoder; use Encode 'decode'; # only for this particular case I will decode QP manually my $d = new MIME::Decoder 'quoted-printable'; # usual way of obtaining bytes decoded from # QP/Base64/7bit/other content-transfer-encodings # is to use MIME::Body methods # encode unicode characters to UTF-8 on printing binmode STDOUT, ":utf8"; # open an in-memory filehandle # since MIME::Decoder only supports filehandles open my $fh, ">", \(my $bytes); # decode the quoted-printable $d->decode(\*DATA, $fh); # decode the bytes my $characters = decode 'utf-8' => $bytes; # prove having 1 character, not 4 bytes while ($characters =~ /(.)/g) { printf "%s is unicode character %x\n",$1,(unpack"W",$1); } __DATA__ =F0=9F=98=B3
      � is unicode character 1f633
      my terminal font doesn't have emoji, so it showed � instead

      More info at perlopen, Encode, perlunitut, perluniintro, perlunifaq.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1059730]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-20 02:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found