http://www.perlmonks.org?node_id=1013851


in reply to Re^2: MIME::Parser parse_data
in thread MIME::Parser parse_data

Content-Type: text/plain; charset="utf-8"

That's a good hint for that part...

Replies are listed 'Best First'.
Re^4: MIME::Parser parse_data
by boosth (Initiate) on Jan 17, 2013 at 20:42 UTC
    The issue appears to be that this call:
    my $tmpMessage = $parser->parse_data($body);
    Is returning decoded strings for some emails but not for others. Some of the emails require the output of this call:
    $tmp_part->bodyhandle->as_string;
    to be decoded manually and others do not. I don't understand why sometimes this call
    $tmp_part->bodyhandle->as_string;
    Returns a human readable decoded string on some emails with base64 encoding but not on all emails with base64 encoding. This is a headache for me because if I change the code to just output the string it breaks on emails that need the string manually decoded. All I am doing is calling "parse_data" and then "bodyhandle->as_string". I'm not sure where the decoding process happens. The original data is definitely base64 encoded which I can see by looking at the raw email data.
      For reference sake I used this method:
      if($tmp_part->bodyhandle->as_string =~ m/^(?: [A-Za-z0-9+\/]{4} ) * (? +:[A-Za-z0-9+\/]{2} [AEIMQUYcgkosw048] = | [A-Za-z0-9+\/] [AQgw] ==)?\ +z/x) { $MessageBody = + " - ". decode('UTF-8',decode_base64($tmp_part->bodyhandle->as_strin +g)); } else { $MessageBody = + " - ". $tmp_part->bodyhandle->as_string; }
Re^4: MIME::Parser parse_data
by boosth (Initiate) on Jan 17, 2013 at 18:57 UTC
    The problem is that I have other emails where it has the same format but I have to use this instead:
    $MessageBody = " ". decode('UTF-8',decode_base64($tmp_part->bodyh +andle->as_string));
    ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH

      Then I would guess that you have an encoding problem in the larger sense.

      Somewhere along the long chain from mail sender via your program to the output of your program, things are getting decoded in the wrong way.

      I know of no other way than to check at every conversion step that you decode the byte sequence from the right encoding.

      As both Content-Type header lines are equal, you should handle them in an equal fashion. This could mean that the sending mail program already encodes the mail in a wrong way. That would be out of your influence.

      I still recommend checking the remaining parts of your program as to whether you mix up different string encodings when converting or printing output.