mr2 has asked for the wisdom of the Perl Monks concerning the following question:

what algorythm should i use for cheking email adress?
(without using a module)

Replies are listed 'Best First'.
Re: email check
by myocom (Deacon) on Jul 18, 2002 at 20:28 UTC

    You should use the one that uses a module. Namely, RFC::RFC822::Address.

    But seriously, why the "no modules" constraint? If you *really* can't use a module, for some bizarre reason, I suggest downloading the module and putting its code in your own source file (take care to heed the copyright notice in the module, though).

    "One word of warning: if you meet a bunch of Perl programmers on the bus or something, don't look them in the eye. They've been known to try to convert the young into Perl monks." - Frank Willison
Re: email check
by FoxtrotUniform (Prior) on Jul 18, 2002 at 20:31 UTC

    Get yourself a copy of RFC 2822. (Don't forget to allow for obsolete addresses; not everyone has read this RFC yet.) If you have access to the O'Reilly book Mastering Regular Expressions, there's a regex in the back of the book that does an incomplete job of checking email addresses, but is about as close as you can get with a plain regex. (Be warned, it's 6598 characters.)

    Why don't you want to use a module?

    Update: If you want to do email verification by hand, the best advice I can give you is to study RFC 2822 thoroughly. You'll also find Text::Balanced useful for pulling apart the address.

    The hell with paco, vote for Erudil!

      "Why don't you want to use a module?"

      well i like to understand ... not just using somoene's
      written module... i'm not trying to write some very short
      script ... because it's very practically for me to write
      something hard(interesting) myself because it's good ...
      it help's inprove'ing my skills.

        What's the difference between using a module and copying code someone here gives you then? If you want to understand, just read the module. Reading code also helps improve your skills -- especially if you're going to be doing any serious programming in the future.

        While I personally don't mind recreating the occasional wheel, I think this might be a bit much. You'll need to read the RFC and fully understand all of it to determine what you'll need to do. There's a million halfassed ways of doing it (I've been guilty of this myself) but you probably would be better served just using the module and picking a more useful wheel to recreate that will give you the opportunity to learn more than the intimate details of parsing email addresses.


        "To be civilized is to deny one's nature."
Re: email check
by insensate (Hermit) on Jul 18, 2002 at 20:29 UTC
    This is not a task for the feint of heart...Check out chapter 7 of Friedl's book...a good 8 pages are dedicated to the topic. Why the non module requirement? That is going to be your best bet here... check out this thread.
Re: email check
by bronto (Priest) on Jul 19, 2002 at 09:39 UTC

    As told by others, you should check the RFC or the module, or the module code. Anyway, if you are reinventing the wheel to learn something new, I'll tell you how I'd do it in the simplest case:

    First of all, there should be just one "@", and if you want to avoid source routing you should also be sure there are no "%" on the right of the @.

    you should check that there actually are characters before the @

    top level domain should be two, three or four alphabetical chars (or you should get a list of TLD and check for them; guess how -hint: use an hash).

    you should really have something more than only a top level domain on the right of @: a dot must be there, and there should be at least one alphanumeric character (plus the "-") between @ and the dot

    Last but not least: no spaces are allowed

    So, a regexp could be:

    use strict ; use warnings ; my @addresses = qw( me@here me@here@there@everywhere why@ ; foreach (@addresses) { print "$_ is " ; print /^\S+\@([a-z-]+\.)+[a-z]{2,4}$/ ? "GOOD!!!" : "bad" ; print "\n" ; }

    This yelds: is GOOD!!! me@here is bad me@here@there@everywhere is bad why@ is bad is bad

    Please remember this is only the simplest case!!!, mail addresses syntax is far more complicated than one would expect reading common addresses. The best way to understand it is to try to write a spam filter for personal use :-)

    Have fun!


    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->made($love) ;