Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

MIME::Parser parse_data

by boosth (Initiate)
on Jan 17, 2013 at 16:59 UTC ( #1013820=perlquestion: print w/ replies, xml ) Need Help??
boosth has asked for the wisdom of the Perl Monks concerning the following question:

HI, I'm stumped on how to resolve this issue. I have an email parser which uses MIME:Parser. It works most of the time but sometimes with some emails I get behaviour which seems to be unintended.
$parser = MIME::Parser->new( ); $parser->ignore_errors(1); $parser->extract_uuencode(1); $parser->extract_nested_messages(1); $parser->output_to_core(1); # don't write attachments to disk my $tmpMessage = $parser->parse_data($body); my $tmp_num_parts = $tmpMessage->parts; for (my $ii=0; $ii < $tmp_num_parts; $ii++) { my $tmp_part = $tmpMessage->parts($i); my $tmp_content_type = $tmp_part->mime +_type; my $tmp_body = $tmp_part->as_string;
Sometimes this works: A)
if ($tmp_body =~ /Content-Transfer-Encoding: base64/i) { $MessageBody = " ". $ +tmp_part->bodyhandle->as_string; }
and sometimes this works: B)
if ($tmp_body =~ /Content-Transfer-Encoding: base64/i) { $MessageBody = " ". d +ecode('UTF-8',decode_base64($tmp_part->bodyhandle->as_string)); }
What I am finding difficult to figure out is how do I handle both cases gracefully. Afaict it is one or the other without some hideous method to check if the text is somehow readable for a human.

Comment on MIME::Parser parse_data
Select or Download Code
Replies are listed 'Best First'.
Re: MIME::Parser parse_data
by Corion (Pope) on Jan 17, 2013 at 17:20 UTC

    I think the content encoding after base64 decoding is in the Content-Encoding header. Or at least, it should be, if the sending MUA adds it. Otherwise, you have to ass-u-me some default encoding.

      An example that I have that is working with option A. Even though the raw data is clearly base64 encoded it is being parsed as a human readable string
      $tmp_part->bodyhandle->as_string
      There is no "Content-Encoding" header in the raw mail.

      Here's the relevant part of the header:
      X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:cont +ent-classes:message MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----_=NextPart001_01CDF2E2.B6A090C6" This is a multi-part message in MIME format. ------_=NextPart001_01CDF2E2.B6A090C6 Content-Type: multipart/alternat +ive; boundary="----_=NextPart002_01CDF2E2.B6A090C6" ------_=NextPart002_01CDF2E2.B6A090C6 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SGkgU3VlICwKCldpbGwgYmUgaW4gdG91Y2ggaSBhbSB0aGlua2luZyBvZiBkb2luZyBhIH +RyaXAg dG8gSXRhbH ...
        Content-Type: text/plain; charset="utf-8"

        That's a good hint for that part...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1013820]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (15)
As of 2015-07-31 17:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (279 votes), past polls