Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^2: MIME::Parser parse_data

by boosth (Initiate)
on Jan 17, 2013 at 18:19 UTC ( #1013850=note: print w/replies, xml ) Need Help??

in reply to Re: MIME::Parser parse_data
in thread MIME::Parser parse_data

An example that I have that is working with option A. Even though the raw data is clearly base64 encoded it is being parsed as a human readable string
There is no "Content-Encoding" header in the raw mail.

Here's the relevant part of the header:
X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:cont +ent-classes:message MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----_=NextPart001_01CDF2E2.B6A090C6" This is a multi-part message in MIME format. ------_=NextPart001_01CDF2E2.B6A090C6 Content-Type: multipart/alternat +ive; boundary="----_=NextPart002_01CDF2E2.B6A090C6" ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH ...

Replies are listed 'Best First'.
Re^3: MIME::Parser parse_data
by Corion (Pope) on Jan 17, 2013 at 18:25 UTC
    Content-Type: text/plain; charset="utf-8"

    That's a good hint for that part...

      The problem is that I have other emails where it has the same format but I have to use this instead:
      $MessageBody = " ". decode('UTF-8',decode_base64($tmp_part->bodyh +andle->as_string));
      ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH

        Then I would guess that you have an encoding problem in the larger sense.

        Somewhere along the long chain from mail sender via your program to the output of your program, things are getting decoded in the wrong way.

        I know of no other way than to check at every conversion step that you decode the byte sequence from the right encoding.

        As both Content-Type header lines are equal, you should handle them in an equal fashion. This could mean that the sending mail program already encodes the mail in a wrong way. That would be out of your influence.

        I still recommend checking the remaining parts of your program as to whether you mix up different string encodings when converting or printing output.

      The issue appears to be that this call:
      my $tmpMessage = $parser->parse_data($body);
      Is returning decoded strings for some emails but not for others. Some of the emails require the output of this call:
      to be decoded manually and others do not. I don't understand why sometimes this call
      Returns a human readable decoded string on some emails with base64 encoding but not on all emails with base64 encoding. This is a headache for me because if I change the code to just output the string it breaks on emails that need the string manually decoded. All I am doing is calling "parse_data" and then "bodyhandle->as_string". I'm not sure where the decoding process happens. The original data is definitely base64 encoded which I can see by looking at the raw email data.
        For reference sake I used this method:
        if($tmp_part->bodyhandle->as_string =~ m/^(?: [A-Za-z0-9+\/]{4} ) * (? +:[A-Za-z0-9+\/]{2} [AEIMQUYcgkosw048] = | [A-Za-z0-9+\/] [AQgw] ==)?\ +z/x) { $MessageBody = + " - ". decode('UTF-8',decode_base64($tmp_part->bodyhandle->as_strin +g)); } else { $MessageBody = + " - ". $tmp_part->bodyhandle->as_string; }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1013850]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2018-04-26 23:44 GMT
Find Nodes?
    Voting Booth?