Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Practical e-mail address validation

by kyle (Abbot)
on Sep 13, 2008 at 19:06 UTC ( #711157=note: print w/replies, xml ) Need Help??


in reply to Practical e-mail address validation

Use \Z to match end of string or newline and \z to match only end of string.

I've long been under the impression that domain names are allowed to have a single trailing dot. That is, "example.com" is the same as "example.com.". As a lame proof of this, dig tells me gives me the same answer either way, but it rejects "example.com.." (not a legal name). I haven't looked closely at RFC1035 for support for this. If you accept this, it screws up the "lc eq lc" test for equality. Maybe you'd want to just s/\.?\s*$// everything before anything else.

Yes, I realize that it is possible to set up a system such that ExpertsExchange@example.com and ExpertSexchange@example.com are completely separate addresses, but anybody who does that deserves to suffer from such a set-up.

Unfortunately, it's usually not the people who set it up who suffer but rather the people who have to use it and often have no control over it. I don't think that ignoring the case of the local part of an email address is a bad design decision—I've done it myself at times. I think, rather, that it's better justified on the grounds that the few mistakes aren't worth the extra work to avoid them.

At YAPC::NA 2008, I attended a talk by Ricardo SIGNES (‎rjbs‎) called "Email Hates the Living!" which discussed some of the pitfalls of parsing email addresses strictly according to the standards. Google knows about it. You might pass that along to anyone else who wants to take an approach less practical than yours. Also, it's hilarious.

Replies are listed 'Best First'.
Re^2: Practical e-mail address validation (case)
by tye (Sage) on Sep 14, 2008 at 00:46 UTC

    The dot at the end of a domain name makes it "absolute" (at least in some situations). Without the final dot, the local domain can be appended to it when trying to resolve it. Compare "nslookup www" vs. "nslookup www." if you are in a domain with a web server, for example ("dig" here appears to just assume a trailing dot if you leave it off). Funnily enough, RFC 2822 doesn't appear to allow a trailing dot (though I didn't read up on the obsolete bits).

    I think, rather, that it's better justified on the grounds that the few mistakes aren't worth the extra work to avoid them.

    It isn't particularly hard to not ignore case only in front of the @. The reason you should ignore case there (but preserve it) is that if you have two addresses that agree except for the case of some letters to the left of the @, the possibilities and their odds are:

    near 0
    The two addresses are different and both valid
    >> 90%
    The two addresses are the same
    << 10%
    One address is valid and the other is invalid

    So handling the over-90% case correctly is a much better idea than handling the near-0% case correctly. For the under-10% case, the choice doesn't matter much, but even there the "ignore case" choice is likely more convenient for the humans involved (who may well know that they can't successfully send e-mail to ExpertSexchange@example.com, only to ExpertsExchange@example.com, but that is no reason to not recognize which account they want a password reminder for when they enter "expertsexchange@example.com" in the web form).

    Or perhaps you meant that it is too much trouble to try to determine if some particular e-mail host ignores case or not. That certainly would be a lot of trouble and I certainly see no point in trying. :) Especially since there is still benefit to ignoring case even in addresses for e-mail hosts that don't. Actually, even if I could conveniently determine if a particular e-mail host ignores case or not, I wouldn't use that information. Just because the person who runs that host puts their users through such pain doesn't mean that I should extend that pain to them when they interact with my system.

    Yes, one of my coworkers went to rjbs' talk and the Email::Address comments that I quoted elsewhere are signed "--rjbs". We'll likely review that material again before this is over.

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711157]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2021-12-08 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (36 votes). Check out past polls.

    Notices?