Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Valid Email Filter

by neilwatson (Curate)
on Mar 25, 2002 at 14:10 UTC ( #154093=sourcecode: print w/ replies, xml ) Need Help??

Category: E-Mail Programs
Author/Contact Info Neil Watson perlmonk@watson-wilson.ca
Description: The marketing department wants to send emails to people who sign up on the company website. But who actually gives their real address? You try to explain this to them but, you're outranked. At least save yourself many of the annoying bounces and deferred messages that can clog your mail queue. This script will check a list of email address (one per line) and see if they are valid by RFC compliance and by performing an MX DNS query. This scirpt is NOT perfect. No email validation script can be. For most purposes it will suffice.
#!/usr/bin/perl

#checks for valid email address
#usage validemail <file containing email addresses>

use warnings;
use strict;
use Mail::CheckUser;
use Parallel::ForkManager;
use Fcntl qw/:flock :seek/;
use FileHandle;

my $pm=new Parallel::ForkManager(20);
my $addrfile = $ARGV[0] || die
"Usage validemail <file containing email addresses>
Will return two files: goodmails.csv and badmails.csv.
If these files exits already they will be deleted.";
my ($is_valid, $host, $x, @mx, $add, @adds);

#custom words that make emails invalid to you
my @custom = qw/
        postmaster
        webmaster
/;

my $regex = join "|", @custom;
$regex = qr/$regex/;

open (EMAILS, "$addrfile");

#remove troublesome windows /r characters
#and leading whitespace
 while (<EMAILS>){
        $_ =~ s/\015//;
    $_ =~ s/^\s*//;
        chomp $_;
    push @adds, $_;
}
close (EMAILS);

#warning, I will delete existing files
open (BADADDR, ">badmails.csv") || die;
open (GOODADDR, ">goodmails.csv") || die;

#to prevent strange writing problems
#when running threads we must set the
#write buffers see
#to prevent strange writing problems
#when running threads we must set the
#write buffers see
#http://www.perlmonks.org/index.pl?node_id=348465
autoflush BADADDR  1;
autoflush GOODADDR 1;

#remove custom regexes
$x = 0;
while ($x <= $#adds){
    if ($adds[$x] =~ m/$regex/){
        splice @adds, $x, 1;
    }else{
        $x++;
    }
}
#when using Mail::UserCheck set
#these variables

#timeout on DNS and SMTP network checks
$Mail::CheckUser::Timeout = 10;

foreach $add (@adds){ 
    
    $pm->start and next;

    if (Mail::CheckUser::check_email($add)){
        writeaddr(*GOODADDR, $add); #address is good
        $pm->finish;
    }else{
        writeaddr(*BADADDR, $add); #address is bad
        $pm->finish;
    }
}

$pm->wait_all_children;

close (BADADDR);
close (GOODADDR);

sub writeaddr{
    my $FH = $_[0];
    my $address = $_[1];
    flock $FH, LOCK_EX or die "Flock failed: $!\n";
    seek  $FH, 0, 2 or die "Seek failed: $!\n";
    print $FH "$address\n";
    flock $FH, LOCK_UN or die "unFlock failed: $!\n";
}

Comment on Valid Email Filter
Download Code
Re: Valid Email Filter
by miyagawa (Chaplain) on Mar 25, 2002 at 17:17 UTC
    Wow, glad to see people use Email::Valid::Loose :) Email::Valid::Loose is a subclass of Email::Valid which allows .(dot) before @ (at mark).

    --
    Tatsuhiko Miyagawa
    miyagawa@cpan.org

      One suggestion for the code .... change open (EMAILS, "$addrfile"); to open (EMAILS, $addrfile) or die "Can't open $addrfile: $!\n"; ALWAYS check to see if 'open' worked - print some descriptive message if it doesn't.
Re: Valid Email Filter
by gav^ (Curate) on Mar 26, 2002 at 04:48 UTC
    I find something like this tends to help. It seems AOL users just can't get it right:
    for ($email) { s/@\./@/; s/@aol\.con/aol.com/; s/aol$/aol.com/; }

    gav^

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://154093]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-12-26 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls