Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

emailpull.pl

by RyuMaou (Deacon)
on Jan 10, 2007 at 19:41 UTC ( #593995=sourcecode: print w/ replies, xml ) Need Help??

Category: E-Mail Programs
Author/Contact Info J K Hoffman
ryumaou at sbcglobal dot net
http://www.ryumaou.com/hoffman/netgeek
Description: Just another quick and dirty script I used to extract e-mail addresses from an mbox mail file. It pulls them and prints them to the screen, so I redirect them to a text file and use the other, related script I posted emailverify.pl, to make sure they are valid.
Again, not the prettiest script, but it got the job done!
#!/usr/bin/perl -w

# a quick and dirty way to pull e-mail addresses out of a mail file
# not overly pretty and there's probably a better way to do it, but...
# written by J K Hoffman on 1-7-07

our $MAILBOX = $ARGV[0];

use strict;
use Email::Find;
use IO::File;

# What time are we going to start this mess?
my $st = localtime;
print "Process started at $st\n";

# Read the mailbox
my $fh = IO::File->new($MAILBOX)
  or die "unable to read mailbox '$MAILBOX': $!";
my $mail;
{
  local $/;
  $mail = <$fh>;
}

# Find addresses
my %addy;
my $finder = Email::Find->new(
  sub {
    my($email, $orig_email) = @_;

    $addy{ $email->address }++;
    return $orig_email;
  }
);
$finder->find(\$mail);

foreach my $address (keys %addy) {
  print "$address $addy{ $address }\n";
}

# What time did it end?

my $et = localtime;
print "Process ended at $et\n";

Comment on emailpull.pl
Download Code
Re: emailpull.pl
by jdporter (Canon) on Jan 10, 2007 at 20:59 UTC

    my $fh = IO::File->new($MAILBOX) or die "unable to read mailbox '$MAILBOX': $!"; my $mail; { local $/; $mail = <$fh>; }
    is probably better written as
    my $mail = do { my $fh = IO::File->new($MAILBOX) or die "unable to read mailbox '$MAILBOX': $!"; local $/; <$fh> };
    That way, $fh gets automatically closed as soon as you're done reading it, and you declare and assign $mail in one step.

    Also, if it was me, I'd probably sort the email addresses when printing them.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
      Yeah, I thought about combining the two scripts, but I could see how having emailverify.pl as a stand alone might be useful. Also, though it's not an excuse, I pulled e-mail addresses from data files totalling close to a gig, possibly more, of raw data. It was only after I collected the data that I saw how much extra sorting I was going to need to do.

      Thanks for the cleanup, though. I don't hardly get to do enough PERL these days, so my code is even sloppier than it used to be!
        ...get to do enough PERL these days...

        You mean Perl.

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://593995]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (12)
As of 2014-08-01 11:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (7 votes), past polls