http://www.perlmonks.org?node_id=80693

There are probably better ways of doing this, but when pressed for an answer one day this is that I came up with. The idea was to parse a text field from a CGI form that supposedly contained a valid E-Mail address. Not only did we want to see if the address was formed correctly but the desire was to make sure that it was deliverable. If you have suggestions on how to improve on it I'd be very happy to hear from you.
sub emailValid { my $email=shift; # This is the addr to test my $t=$email; # We're gonna chop up $t mercilessly local *PIPE; my $nslookup; # A very simple stupid way to find nslookup. It's gotta be here # somewhere.... if ( -x "/usr/sbin/nslookup" ) { $nslookup = "/usr/sbin/nslookup"; } else { $nslookup = "/usr/bin/nslookup"; } # # Tokenize the mail address into user@domain syntax $t=~m:^(.*)\@(.*)$:; my($user,$domain)=($1,$2); return 1 if (! $user ) or ( $user eq '' ); # No user! return 2 if (! $domain ) or ( $domain eq '' ); # No domain! my $IFS=$/; # Save that please... $/=''; # We're gonna make one big input... open(PIPE,"$nslookup -type=any $domain 2>&1 |") or carp "Cannot run $nslookup ! " . $! ; my @check=<PIPE>; #slurp! close PIPE; # gulp! $/=$IFS; # put that back. return 3 if grep /Non-existent/,@check; return 0; # Checks out ok... }

Replies are listed 'Best First'.
Re: valid email addresses
by merlyn (Sage) on May 15, 2001 at 23:52 UTC
    Replace:
    my $IFS=$/; # Save that please... $/=''; # We're gonna make one big input... open(PIPE,"$nslookup -type=any $domain 2>&1 |") or carp "Cannot run $nslookup ! " . $! ; my @check=<PIPE>; #slurp! close PIPE; # gulp! $/=$IFS; # put that back. return 3 if grep /Non-existent/,@check;
    with
    `$nslookup -type=any $domain 2>&1` =~ /Non-existent/ and return 3;
    I think you'll find that much simpler.

    -- Randal L. Schwartz, Perl hacker

Re: valid email addresses
by merlyn (Sage) on May 16, 2001 at 00:01 UTC
      It just goes to show, if you're trying to reinvent the wheel all you really need is the following code:
      #!/usr/bin/perl open (MOUTH) && insert ('foot');

        See my reply to merlyn. Sometimes ya gotta do what ya gotta do...

        
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        Peter L. Berghold --- Peter@Berghold.Net
        "Those who fail to learn from history are condemned to repeat it."
        

      No doubt this would have saved some work. However for political reasons (unreasonable system manager) I was forced to create my own solution.

      On my own web site I use that module, however.

      ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Peter L. Berghold --- Peter@Berghold.Net
      "Those who fail to learn from history are condemned to repeat it."
      
Re: valid email addresses
by tachyon (Chancellor) on May 16, 2001 at 17:26 UTC
    You seem to be focused on a technical solution when in fact you face a largely psychological problem.

    There are a number of reason a user might enter an invalid address:

    Desire to avoid 'spam'
    Typos
    Hacking your system

    Of these the hacking issue can be dealt with by checking the email address for such things as shell characters and excessive length (buffer overflows). You have been referred to appropriate source of info for this. 

    Typos are impossible to differentiate from fake addresses unless you insist that users do the old password repeat. But even if a valid email is entered what does that mean? I have a garbage email address that is certainly valid but is never read and automatically cleared. You can have that but, valid or not, it is little use to you. I was forced to do this after my automated website submission script generated 400 reply emails in the first 20 minutes and over 1000 for the first day. My original address still receives over 30 spam messages a day from this one judgement error!

    So to the psychological bit. Let's be frank. You generally want valid email addresses for marketing purposes. One man's marketing is another man's spam. Nonetheless you can be confident that if a user is willing to take the time to type in their name and email in return for some sort of enticement they will *probably* accept some form of validation.

    To put it in marketing speak you must get them while they are hot. The best way to validate an email address is thus to immediately send an email to that user via auto responder. To get their widget, read the secret files or whatever they have to respond to this email in some way. If they respond you *have* validated them, at least for that single moment in time - they may killfile your address or remove that username, redirect ... but you can't win them all.

    There are many ways to get the user to respond. These can be quite subtle and unobtrusive if you put your mind to it.

    The blank email reply
    The subscribe me in the subject or body reply
    The link to the secret pages
    The link to a cgi
    The password
    The cookie via html/javascript or perl or whatever

    Just a few comments on your script.

    A huge security hole seems to be that you perform absolutely *no* character checking on $domain. Any user input, especially input that may be passed to a shell needs to be validated and have shell chars removed. See validating an email address, perlsec and taint or -T

    The second issue is that the existence of a domain proves.....that the domain exists. As I read it your script will validate mickey_mouse&donald_duck@hotmail.com. Whilst this may indeed be a valid email address it is certainly not my valid email address.
    return 1 if (! $user ) or ( $user eq '' ); # No user! return 2 if (! $domain ) or ( $domain eq '' ); # No domain! # The second part of these lines is *never* looked at. If $user ='' # then !$user is true and they return without ever evaluating the # bit past the or. You don't need the parents either. return 1 if ! $user ; # No user! return 2 if ! $domain; # No domain! # The if ! (if-not) syntax is what unless was invented for return 1 unless $user; # No user! return 2 unless $domain; # No domain!
    Hope this helps.

    tachyon

      I won't even begin to argue about the desire to plug in a false email address to prevent spamming and its validity. That wasn't the point in the eyes of my client at the time. The client gets what he wants.

      As far as the mickey_mouse@hotmail.com goes, the program spec I was working against was to validate the domain names as being real and thankfully not the actual account. I think that validating the account would be harder...

      ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Peter L. Berghold --- Peter@Berghold.Net
      "Those who fail to learn from history are condemned to repeat it."
      
Re: valid email addresses
by koolade (Pilgrim) on May 16, 2001 at 07:13 UTC

    Read this in the perlfaq for a little bit more information on parsing email addresses against RFC 822.

      Actually at some point I had. I was putting something KISS together...

      If I were truly serious about verifying an EMAIL address I would do something like what was mentioned in the perlfaq only with a link to a perl_mod handled URL with a random string as part of the URL. The decyphering of the random URL would map back to their email address through a query of a session table.

      Complicated, but effective.

      ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Peter L. Berghold --- Peter@Berghold.Net
      "Those who fail to learn from history are condemned to repeat it."
      
Re: valid email addresses
by olly (Scribe) on May 16, 2001 at 12:39 UTC
    It is very hard to make shure you don't match a valid email adres. So by far the best way to do it is make some kind of activation system with a url in the email. This wya you know the email entered has to be real and deliverable.

    Imagination is more important then knowledge -Einstein-

Re: valid email addresses
by raven67 (Initiate) on Nov 27, 2001 at 20:40 UTC
    I dont know if this might help, but my regex for email validation on all forms is :

    m/^([\_\-\w\.]+)\@([\-\_\.\w]+\.\w{2,4})$/

    Seems to be the best ive come up with, even makes sure the top level is 2 to 4 characters. DAMN THE .INFO!

    --
    ravnx - EFNet
    ravnx@netpimpz.com

      This regexp is broken. It doesn't accept legimate email addresses. Reread this thread and read RFC. Stop reinventing broken wheels.