http://www.perlmonks.org?node_id=1013825


in reply to MIME::Parser parse_data

I think the content encoding after base64 decoding is in the Content-Encoding header. Or at least, it should be, if the sending MUA adds it. Otherwise, you have to ass-u-me some default encoding.

Replies are listed 'Best First'.
Re^2: MIME::Parser parse_data
by boosth (Initiate) on Jan 17, 2013 at 18:19 UTC
    An example that I have that is working with option A. Even though the raw data is clearly base64 encoded it is being parsed as a human readable string
    $tmp_part->bodyhandle->as_string
    There is no "Content-Encoding" header in the raw mail.

    Here's the relevant part of the header:
    X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:cont +ent-classes:message MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----_=NextPart001_01CDF2E2.B6A090C6" This is a multi-part message in MIME format. ------_=NextPart001_01CDF2E2.B6A090C6 Content-Type: multipart/alternat +ive; boundary="----_=NextPart002_01CDF2E2.B6A090C6" ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH ...
      Content-Type: text/plain; charset="utf-8"

      That's a good hint for that part...

        The problem is that I have other emails where it has the same format but I have to use this instead:
        $MessageBody = " ". decode('UTF-8',decode_base64($tmp_part->bodyh +andle->as_string));
        ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH
        The issue appears to be that this call:
        my $tmpMessage = $parser->parse_data($body);
        Is returning decoded strings for some emails but not for others. Some of the emails require the output of this call:
        $tmp_part->bodyhandle->as_string;
        to be decoded manually and others do not. I don't understand why sometimes this call
        $tmp_part->bodyhandle->as_string;
        Returns a human readable decoded string on some emails with base64 encoding but not on all emails with base64 encoding. This is a headache for me because if I change the code to just output the string it breaks on emails that need the string manually decoded. All I am doing is calling "parse_data" and then "bodyhandle->as_string". I'm not sure where the decoding process happens. The original data is definitely base64 encoded which I can see by looking at the raw email data.