Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^6: How to parse outlook type attachment from inbox

by perlCrazy (Monk)
on Sep 23, 2006 at 08:47 UTC ( [id://574495]=note: print w/replies, xml ) Need Help??


in reply to Re^5: How to parse outlook type attachment from inbox
in thread How to parse outlook type attachment from inbox

Why are you trying to parse header information from attachments?

There are 50k emails are lyin in inbox all emails has come from one email address(ex: staff@lastminute.com) and each email
is coming from end customer as attachment that is
containing sender address that will be hotmail address. Now all attachmnet it self is a mail that is why i need to p
arse header so i can get those email address.

Why are you using proprietary headers rather than standard ones (X-MS-Has-Attach for example)?


I opened the outlook options and found that all attachment information is like this only,
that is why i checked that condition. and it is storing all attachment OK. But i need to get the header info from attachment
  • Comment on Re^6: How to parse outlook type attachment from inbox

Replies are listed 'Best First'.
Re^7: How to parse outlook type attachment from inbox
by perlCrazy (Monk) on Sep 25, 2006 at 11:06 UTC
    #!/usr/bin/perl use strict; use Mail::POP3Client; use MIME::Parser; use MIME::Head; use Mail::Header; #usage() unless scalar @ARGV == 3; my $pop = new Mail::POP3Client( HOST => 'server', USER => 'my', PASSWORD => 'pass' ); my $tmp_directory = "/tmp/attach"; my $parser = new MIME::Parser; $parser->output_dir($tmp_directory); $parser->output_prefix("attachment"); $parser->output_to_core(); open (FH,">>/tmp/msgtxt1"); for (my $i = 1; $i <= $pop->Count(); $i++){ my $head = $pop->Head($i); if ($head =~ /X-MS-Has-Attach: yes/i){ foreach ( $pop->Head( $i ) ) { if( /^(From|Subject):\s+/i ) { if ( /Vincent/) { my $msg = $pop->HeadAndBody($i); ### Automatically attempt to RFC-1522- +decode the MIME headers? $parser->decode_headers(1); + ### default is false ### Parse contained "message/rfc822" o +bjects as nested MIME streams? $parser->extract_nested_messages(0); + ### default is true ### Look for uuencode in "text" messag +es, and extract it? $parser->extract_uuencode(1); + ### default is false ### Should we forgive normally-fatal e +rrors? $parser->ignore_errors(0); + ### default is true my $entity = $parser->parse_data($msg) +; } } } # print "\n"; } } close(FH); $pop->Close(); sub usage { print "Usage: $0 <mail_server> <username> <password>\n"; exit; }

    Above code does save all attachment. Now I need to parse those attachment
    One of the attachment will be .msg file that contains header einfo etc.
    I need to extarct those header info and find the sender address.
    Can somebody help me in this.

      I think you're overcomplicating this problem. So here's my final attempt to explain it to you.

      MIME messages are hierarchical in nature. That is to say, a MIME message can contain other MIME messages which can, in turn, also contain other MIME messages and so on. So whatever you use to parse a MIME message, can also be used to parse its children.

      In fact, MIME::Parser is cleverer than that, if you set the extract_nested_messages flag to 1 (which is the default value, I'm not sure why you changed it in your code) then it will produce a tree structure containing all of the MIME messages contained within your original message. You can see the structure of this tree with the dump_skeleton method and you can get the children of any given MIME message by using the parts method.

      So, you've gone as far as getting the top level message in $entity. If you change the nested messages flag to 1 then your entity will contain sub-entities which you can access using $entity->parts. Each of these contained messages will be a MIME::Entity object and you can extract various parts of the message (like the headers) using the methods described in the documentation.

      Is that clearer?

      --
      <http://dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        Thanks for reply and your precious time.
        I think you got fed up that is why you written "So here's my final attempt to explain it to you."
        Any way thanks again, i got the solution, it is like this:
        Store all attachment in directories then read all files one by one then do the regexp and find the hotmail addresses.
        All attachment is .msg file so it becomes easy for me to identify.
        I just want your opinion on this, is this solution good?
        In future also there will be lot of emails will be keep coming and i need to parse those
        attachment and then find the sender address. I am posting my code here.Regarding setting the flag 0 to below method extract_nested_messages, due to storing all attachment.If I set this to 1, then it doesn't stores my .msg attachment that is the main one and I need to extract header info from that only. If I set to 1 and then try to get entity->parts, it just says that mime type, whn i try to get the header info doesn't print anything. May be because i am doing something wrong or not calling the method properly. If u have time then pls. look into my code then advice. Thanks in advance.
        #!/usr/bin/perl use strict; use Mail::POP3Client; use MIME::Parser; use MIME::Head; use Mail::Header; #usage() unless scalar @ARGV == 3; my $pop = new Mail::POP3Client( HOST => 'server', USER => 'my', PASSWORD => 'pass' ); my $tmp_directory = "/tmp/attach"; my $parser = new MIME::Parser; $parser->output_dir($tmp_directory); $parser->output_prefix("attachment"); $parser->output_to_core(); open (FH,">>/tmp/msgtxt1"); for (my $i = 1; $i <= $pop->Count(); $i++){ my $head = $pop->Head($i); if ($head =~ /X-MS-Has-Attach: yes/i){ foreach ( $pop->Head( $i ) ) { if( /^(Subject):\s+/i ) { if ( /complaint about message from/) { my $msg = $pop->HeadAndBody($i); ### Automatically attempt to RFC-1522- +decode the MIME headers? $parser->decode_headers(1); + ### default is false ### Parse contained "message/rfc822" o +bjects as nested MIME streams? $parser->extract_nested_messages(1); + ### default is true ### Look for uuencode in "text" messag +es, and extract it? $parser->extract_uuencode(1); + ### default is false ### Should we forgive normally-fatal e +rrors? $parser->ignore_errors(0); + ### default is true my $entity = $parser->parse_data($msg) +; #$entity->print_header() my $head = $entity->head; my $msg_summary = " Message Summar +y for message \n" . " From : " . $head->get('From') +. " To : " . $head->get('To') . " Subject: " . $head->get('Subject +') . " Date : " . $head->get('Date') +. "\n"; print " -- $msg_summary \n"; my @parts = $entity->parts; my $body; my @attachments; # If this is a single part message, we + should just append it # (To be handled later) unless( $entity->parts ){ # Single part. Just add it as a comment $body = $entity->stringify_bod +y; #print " -- $body \n"; } else { # Multi-part. # Assume the first part is the + comment body and any additional # parts are the attachments @attachments = $entity->parts; $body = $attachments[0]->strin +gify_body; } for my $attachment ( @attachments ) { my $headers = $attachment->hea +d; print " - $attachment->bodyhan +dle->as_string()\n"; #print " -- $attachment --- +\n"; } #$entity->dump_skeleton; ### Get the head, a MIME::Head: } } } # print "\n"; } } close(FH); $pop->Close(); my $dirtoget="/tmp/attach/"; opendir(IMD, $dirtoget) || die("Cannot open directory"); my @thefiles= readdir(IMD); foreach my $fl (@thefiles) { if ($fl =~ m/msg/){ &readFile($fl); } } sub readFile { my ($filename) = @_; $filename = $dirtoget.$filename; open(MSG,"$filename") || die ("Cannot Open File for Reading\n" +); while (<MSG>) { my $ln = $_; if ($ln =~ m/^To: /){ print $ln; } } close(MSG); }

        Thanks in advance

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://574495]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found