Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

On Validating Email Addresses

by dws (Chancellor)
on Jan 04, 2005 at 02:01 UTC ( #419130=perlmeditation: print w/ replies, xml ) Need Help??

This holiday season my wife, who has a .name email address, encountered several sites that wouldn't accept her email address. They didn't get her business.

It's reassuring to know that nobody here would ever code up a validation routine that would reject a four character TLD in an email address. Right? Not even me. Right?

Oops.

Comment on On Validating Email Addresses
Re: On Validating Email Addresses
by Thilosophy (Curate) on Jan 04, 2005 at 02:51 UTC
    Validating email addresses with some regular expression is over-rated. You can only check if the email address is well-formed (not if it exists), but as your .name example shows, if you are too zealous here, you get into trouble by rejecting too much.

    If someone wants to not give you his real email address, he can just type mickey.mouse@microsoft.com which would be fine for your validator routine. If someone mistypes his email address by accident, the chance that your validator can catch that is very slim as well.

    If you need to validate an email address, the only way to do that is to send an email to that address and wait for a reply. So for form validations, it does not make sense to check more than that the string contains an @ and at least one dot after that.

    if ( /\@.+\./ ) { # email looks good }
      the string contains an @ and at least one dot after that
      Even that's not quite right (see how easy this is to get wrong!).

      One of the country-code registrars (I forget which now) has addresses at the top-level domain! Like "foo@to" for the ".to" registrar.

      So please, don't look for a dot. Stick with the Email::Valid-style validators.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        One of the country-code registrars (I forget which now) has addresses at the top-level domain! Like "foo@to" for the ".to" registrar.

        Wow. I am shocked ;-) Have to rewrite some code now...

        if (/\@/) { # email looks valid... }

      merlyn already noted the TLD problem. But really, you're now being too generous. The real solution is to use Email::Valid, which contains a very large and complex regex, plus a few other validation routines.

      As complex as that regex is, it still won't match embedded comments in the address, but that's usually not a problem.

      "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

        But really, you're now being too generous. The real solution is to use Email::Valid, which contains a very large and complex regex, plus a few other validation routines.

        Well, my point was that you cannot validate the email with a regular expression anyway. You are very unlikely to even catch typos. If my email is bill@microsoft.com and I mistype it as bikk@microsoft.com how is Email::Valid going to help you? So why bother at all?

        Concession: Email::Valid can also check if an MX entry exists for the domain. That might make sense in some situations (but it still does not check the user name -- is there a way to do this, too?)

        What?! Email::Valid fails on embedded comments? That's an astonishingly common feature of actual email addresses in the wild. I managed a number of public inboxes for a global corporation for a few years and I had to take special care in my own email address parsing code (in a VB dialect) to handle comments.

        I mean, of the form (Fname Lname) <addr@example.com> and <addr@example.com> (Fname Lname). I never saw addr@example( ... ).com. Of those three forms, which are supported? Anything good will handle the first two and I don't think the third matters. I'm speaking only from what I saw in actual usage.

Re: On Validating Email Addresses
by BrowserUk (Pope) on Jan 04, 2005 at 03:50 UTC

    This begs the question: "Why are you attempting to validate an email address"?

    You wish to ensure that the user has typed his email address correctly because:

    1. As a service to your users--if they mistyped it, they will not receieve the information they have asked you for.

      If they really want that information, they will type it correctly.

    2. As a service to yourself--you wish to ensure that you can send them emails, before you allow them to progress to using your service.

      You don't wish to allow them to use your "service" (or have access to your information) without your being able to spa^H^H^H send them your very important and useful information.

      Again, if they want to receive whatever information you want to send them, they will type their email correctly.

      If they do not want to receive it, then typing some spurious addy, like a@b.com, will satisfy most simplistic checks. I don't know which poor blighter has the email addy a@b.com, but they must recieve a sh^H^H lot of junk they never asked for.

      If it is really commercially necessary to restrict your info/service to only those people that you can spam, the only(?) way, is to only provide access, once you recieve a confirmation to an email sent to the address supplied.

      Even this is easily bypassed by those that don't wish to receive "further correspondance", once they have satisfied your requirements.

    3. Other?

    I first encountered this concept so loved by marketing people at Tandy/RadioShack. I was asked for my (land) address "before the till would accept my payment". As it happened, I was 3,500 miles from home at the time--which I explained.

    "But the till won't accept payment without an address", I was told. "Where are you staying?".

    "I don't know the address. I know how to get there, it was the first motel I encountered leaving the airport, but what the address is I have no idea. Hell, I'm not even sure what the name of the place is!".

    "I have to have an address before I can complete the transaction".

    "Okay, if any address will do, put your address in".

    "Oh! I can't so that...".

    Okay, please wait a minute while I check the address"..{I disappeared out of the shop for a couple of minutes}..."Okay. It's 7125"

    "7125"

    "1st Street"

    "First Street".

    "Utah"

    "And the zip code?"

    "Sorry, I don't know the zipcode".

    "Okay, it should be able to look it....Hang on. That's the address of this shop!?".

    "Did the till accept it?"

    "Erm...Yes".

    "Great! We're done then".


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      If they really want that information, they will type it correctly.

      Since you cannot check if an email exists anyway, the most useful type of form data validation for emails is the one that catches typos, which is to have them enter the email twice and validate that they match (same as when you ask for password confirmation).

        I wonder at those quite frequently.. And quite likely, if I typed it wrong the first time, then the second will be equally incorrect, since shift-tab, home, shift-end, ctrl-c, tab, ctrl-v is quicker to type than my address, which isnt one of the short ones.. :)

        C.

      If they really want that information, they will type it correctly.

      I wish that were true. It's certainly not true for me. I mistyped my own email address the last time I submitted a bug and patch to CPAN, and didn't notice it until I re-read the submitted bug report. And I want to know if the patch gets accepted.

      This begs the question: "Why are you attempting to validate an email address"?

      Yeesh, that is one of my pet peeves. It may raise or suggest the question, but it certainly does not beg the question.

        Yeesh, that is one of my pet peeves. It may raise or suggest the question, but it certainly does not beg the question.

        Of the 323,000 references to this phrase turned up by google, about 2 or 3 percent are people who have either unilaterally decided or have accepted the wisdom of some other, petitio principii-aware, usage nazi, that the only acceptable usage of this phrase is the classical rhetorical fallacy usage:

        To beg the question means 'to assume the truth of the very point being raised in a question'.

        The other 90%+, found in many highly respectable sources, including The New York Times, The Wall Street Journal, The Economist, The Times Literary Supplement, and even a hard-core academic journals, are usages similar to mine above, where the verb 'begs' is used as a substitute for the word 'entreat' or the phrase "ask earnestly for or of'.

        1. Language is a live, mutating entity and 'new' forms of usage are being adopted all the time.

          The only 'static' languages, are dead languages--like Latin.

        2. The classical usage is itself suspect.

          Let's try a little substitution--'beg' for 'assume':

          'to beg the truth of the very point being raised in a question'

          Does that make equivalent sense to the classical definition above? I think not.

          Or 'beg the point in a dispute' as meaning 'To take for granted without proof'?

          However, try:

          'That entreats the question...' or 'That implores the question "...", be asked. or 'That craves the question...'.

          I think those do!?

          Do you think it is possible that some ancient scholar made an error when translating from Latin or Greek to English or French at some point in history, and as a result, that nonsensical, idiomatic phrase has become enshrined in classical rhetorical teaching?

        3. Your suggested alternatives--"It may raise or suggest the question,..."--do not capture the essence of this usage.

          The implication of the phrase in the usage is not that the original text raised the question.

          It is that the original text didn't ask the question, when it probably should have asked.

          Whilst that is absolutely different from the classical usage, it does coincide with various other usages of the word 'beg' as a substitute for the word 'ask'.

          As in, 'I beg your forgiveness', or 'I beg to differ', or 'They begged the court's indulgance'.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.

      If they do not want to receive it, then typing some spurious addy, like a@b.com, will satisfy most simplistic checks. I don't know which poor blighter has the email addy a@b.com, but they must recieve a sh^H^H lot of junk they never asked for.

      I get spam at no@tnx.nl, which doesn't exist. But I get more spam even at aoeu%aoeu.nls/%/@/, which does exist. I feel sorry for the person who has asdf@asdf.com.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      If they do not want to receive it, then typing some spurious addy, like a@b.com, will satisfy most simplistic checks. I don't know which poor blighter has the email addy a@b.com, but they must recieve a sh^H^H lot of junk they never asked for.

      A lot of people don't seem to know about m!^example\.(?:com|org|net)!. These domains are specifically meant for documentation and such, and therefore are perfect for fake email addresses (foobar@example.com for example). There are no MX records for these domains, nor is there anything listening on port 25. So as long as you're giving this value to a program that doesn't check for an MX record, these domains are perfect -- they are fake, and at the same time, you're not chancing giving someone's real email address and getting them spam'ed to death.

        There are no MX records for these domains,

        I use me@privacy.net instead. See this for more info.

        The other approach, especially for one time registrations, is to use something like the Mailinator (unfortunately it seems to be down right now)

Re: On Validating Email Addresses
by Zaxo (Archbishop) on Jan 04, 2005 at 05:07 UTC

    I really like to try bang paths on those things.

    After Compline,
    Zaxo

Re: On Validating Email Addresses
by Juerd (Abbot) on Jan 04, 2005 at 07:50 UTC

    Way too many sites reject the perfectly valid *@juerd.nl. On the other hand, I can safely use it here, because spam bots don't recognise it anyway :).

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      I notice that a lot of sites have problems with a perfectly valid "user+something@domain.tld" address too...

      --
      b10m

      All code is usually tested, but rarely trusted.
Re: On Validating Email Addresses
by Callum (Chaplain) on Jan 04, 2005 at 09:56 UTC
    My (vanity) address is c@llum.... and that has been rejected by a variety of sites (including those of some major universities and companies) as they check for the local-part of the address to be at least two or, if memory serves in one case, three characters long.

    C

Re: On Validating Email Addresses
by demerphq (Chancellor) on Jan 04, 2005 at 17:36 UTC

    Some others to watch out for:

    1. Don't ask for "zip codes" only about 200 million people know what the hell they are. The rest of the world calls them "postal codes" (assuming they speak/read English.)
    2. Don't assume that all postal codes will be in the same format as zip codes. Since most American developers also usually deploy to Canada which uses A0A 0A0 style postal codes this usually isnt a problem, but if you cant handle SW1 as a post code you've just eliminated some of the richest British people there are from using your service.
    3. Don't assume that all telephone numbers are of the form (123)456-7890. That pattern is only valid in a few countries outside of North America and the Carribean. Some countries may have more digits, or even far less. In Germany for instance a phone number could be as few as 8 digits or as large as 12.
    4. Don't assume that the street number comes before the street name. In Germany for instance its normal to say the street number after the street name. (I work in Wall St. 12 not I work in 12 Wall St.).
    5. Don't assume that people will have a middle name for disambiguation purposes. In some cultures middle names are uncommon.
    6. Don't assume that 12 hour time will be instantly readable by everybody. Support 24 hour time as well. Ideally let the user choose.
    7. Don't assume that dates are in MM/DD/YY format. That particular date format is pretty well restricted to the civilian US market. Use "YYYY-MM-DD" and youll annoy a few traditionalist but never be misunderstood.
    ---
    demerphq

      A few further points:
      • Many people only have one name. For example, an Indonesian presidential candidate from a few years back was called "Wiranto". That was his complete name.
      • Don't ask for "State", either. It is much better to ask for Region. And, definitely don't limit to two letters!
      • Don't assume that common abbreviations are in force. For example, do you know what Rue is? What about Via?*
      • Don't assume that a phone number is guaranteed to match to a location. There are companies that have no physical location and whose employees work completely by cellphone.

      *: Both mean street. Rue is French and and Via is Italian.

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

        > do you know what Rue is?

        yes, I do ;-)

        Anima Legato
        .oO all things connect through the motion of the mind

Re: On Validating Email Addresses
by fraktalisman (Hermit) on Jan 04, 2005 at 18:03 UTC
    Freemailers allow emails with unusual parts before the at sign. Like ...name@web.de which looks cool with certain short names, but which is also rejected by many mailing programs. Another tricky one is an abbreviation like a.b.c.@gmx.net which is also valid there, rejected by some strict checkers, and most horribly often mistyped because people also usually think there should be no trailing dot.
    Lessons learned? Not to check too strictly (like said above) if at all, and secondly both friends got alternate addresses without any dots before the at, just to be sure everyone could message them.
      Like ...name@web.de which looks cool with certain short names, but which is also rejected by many mailing programs.
      If you can point out to me a big-market mailer that rejects valid RFC2822 addresses, I'd be thankful. I know that SMTP (RFC 2821) handles everything that RFC2822 can dish out, and that all popular Unix MTAs handle 2822 addresses.

      So far, the only problem I've seen with "unusual" addresses are with overzealous web forms.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        I just tried the address, with MS Outlook Express and it worked, I asked the person what programs caused the problem. If she remembers, I'll post it here.
      One large site (I think it was monster.com) once balked the presence of a dot in the middle, e.g., zed.lopez@example.com, for me. Sigh.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://419130]
Approved by TStanley
Front-paged by grinder
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (10)
As of 2014-12-18 03:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (41 votes), past polls