Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

MIME::Parser parse_data

by boosth (Initiate)
on Jan 17, 2013 at 16:59 UTC ( #1013820=perlquestion: print w/ replies, xml ) Need Help??
boosth has asked for the wisdom of the Perl Monks concerning the following question:

HI, I'm stumped on how to resolve this issue. I have an email parser which uses MIME:Parser. It works most of the time but sometimes with some emails I get behaviour which seems to be unintended.
$parser = MIME::Parser->new( ); $parser->ignore_errors(1); $parser->extract_uuencode(1); $parser->extract_nested_messages(1); $parser->output_to_core(1); # don't write attachments to disk my $tmpMessage = $parser->parse_data($body); my $tmp_num_parts = $tmpMessage->parts; for (my $ii=0; $ii < $tmp_num_parts; $ii++) { my $tmp_part = $tmpMessage->parts($i); my $tmp_content_type = $tmp_part->mime +_type; my $tmp_body = $tmp_part->as_string;
Sometimes this works: A)
if ($tmp_body =~ /Content-Transfer-Encoding: base64/i) { $MessageBody = " ". $ +tmp_part->bodyhandle->as_string; }
and sometimes this works: B)
if ($tmp_body =~ /Content-Transfer-Encoding: base64/i) { $MessageBody = " ". d +ecode('UTF-8',decode_base64($tmp_part->bodyhandle->as_string)); }
What I am finding difficult to figure out is how do I handle both cases gracefully. Afaict it is one or the other without some hideous method to check if the text is somehow readable for a human.

Comment on MIME::Parser parse_data
Select or Download Code
Re: MIME::Parser parse_data
by Corion (Pope) on Jan 17, 2013 at 17:20 UTC

    I think the content encoding after base64 decoding is in the Content-Encoding header. Or at least, it should be, if the sending MUA adds it. Otherwise, you have to ass-u-me some default encoding.

      An example that I have that is working with option A. Even though the raw data is clearly base64 encoded it is being parsed as a human readable string
      $tmp_part->bodyhandle->as_string
      There is no "Content-Encoding" header in the raw mail.

      Here's the relevant part of the header:
      X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:cont +ent-classes:message MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----_=NextPart001_01CDF2E2.B6A090C6" This is a multi-part message in MIME format. ------_=NextPart001_01CDF2E2.B6A090C6 Content-Type: multipart/alternat +ive; boundary="----_=NextPart002_01CDF2E2.B6A090C6" ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH ...
        Content-Type: text/plain; charset="utf-8"

        That's a good hint for that part...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013820]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-07-28 22:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (210 votes), past polls