http://www.perlmonks.org?node_id=593995
Category: E-Mail Programs
Author/Contact Info J K Hoffman
ryumaou at sbcglobal dot net
http://www.ryumaou.com/hoffman/netgeek
Description: Just another quick and dirty script I used to extract e-mail addresses from an mbox mail file. It pulls them and prints them to the screen, so I redirect them to a text file and use the other, related script I posted emailverify.pl, to make sure they are valid.
Again, not the prettiest script, but it got the job done!
#!/usr/bin/perl -w

# a quick and dirty way to pull e-mail addresses out of a mail file
# not overly pretty and there's probably a better way to do it, but...
# written by J K Hoffman on 1-7-07

our $MAILBOX = $ARGV[0];

use strict;
use Email::Find;
use IO::File;

# What time are we going to start this mess?
my $st = localtime;
print "Process started at $st\n";

# Read the mailbox
my $fh = IO::File->new($MAILBOX)
  or die "unable to read mailbox '$MAILBOX': $!";
my $mail;
{
  local $/;
  $mail = <$fh>;
}

# Find addresses
my %addy;
my $finder = Email::Find->new(
  sub {
    my($email, $orig_email) = @_;

    $addy{ $email->address }++;
    return $orig_email;
  }
);
$finder->find(\$mail);

foreach my $address (keys %addy) {
  print "$address $addy{ $address }\n";
}

# What time did it end?

my $et = localtime;
print "Process ended at $et\n";
Replies are listed 'Best First'.
Re: emailpull.pl
by jdporter (Canon) on Jan 10, 2007 at 20:59 UTC

    my $fh = IO::File->new($MAILBOX) or die "unable to read mailbox '$MAILBOX': $!"; my $mail; { local $/; $mail = <$fh>; }
    is probably better written as
    my $mail = do { my $fh = IO::File->new($MAILBOX) or die "unable to read mailbox '$MAILBOX': $!"; local $/; <$fh> };
    That way, $fh gets automatically closed as soon as you're done reading it, and you declare and assign $mail in one step.

    Also, if it was me, I'd probably sort the email addresses when printing them.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
      Yeah, I thought about combining the two scripts, but I could see how having emailverify.pl as a stand alone might be useful. Also, though it's not an excuse, I pulled e-mail addresses from data files totalling close to a gig, possibly more, of raw data. It was only after I collected the data that I saw how much extra sorting I was going to need to do.

      Thanks for the cleanup, though. I don't hardly get to do enough PERL these days, so my code is even sloppier than it used to be!
        ...get to do enough PERL these days...

        You mean Perl.