Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Practical e-mail address validation

by Limbic~Region (Chancellor)
on Sep 13, 2008 at 16:27 UTC ( #711132=note: print w/ replies, xml ) Need Help??


in reply to Practical e-mail address validation

tye,
What I would rather see on CPAN is a email address validation that allows you to pick and choose (as well as add your own) what rules you want to play by. For instance "do not accept email address that require an open relay to work" or "accept email address that have a period before @ in violation of the RFC".

Now on to your problem at hand. Are the following email addresses "equivalent"?

1. foo@bar.com 2. Foo@bar.com 3. foo@BAR.com
It turns out that 1 and 3 are but 2 is not. You have already mentioned this. I only bring it up again to point out another "rule" for this theoretical CPAN module - to consider case in the user portion of the address. Here is another one that may be difficult to tell:
1. foo@bar.com 2. foo%bar.com@asdf.com # corrected
These are functionally equivalent because it expects the MTA at asdf.com to relay the mail to bar.com.

So I have no practical use for your validation routines but would love to see a more flexible module - for reasons I mention here as well as ones mentioned here, here and there.

Cheers - L~R


Comment on Re: Practical e-mail address validation
Select or Download Code
Re^2: Practical e-mail address validation (flex)
by tye (Cardinal) on Sep 13, 2008 at 17:12 UTC

    Having just skimmed the parts of RFC 2822 regarding e-mail addresses, it was pretty clear that "accept email address that have a period before @ in violation of the RFC" isn't based on the copy that I found. It clearly went to some length to document how you can use period before the @ so I'd be quite to surprised to find that some other part of the RFC disallowed such usage.

    As for ignoring case before the @, I wrote above:

    Yes, I realize that it is possible to set up a system such that ExpertsExc­hange@exam­ple.com and ExpertSexc­hange@exam­ple.com are completely separate addresses, but anybody who does that deserves to suffer from such a set-up.

    which I think makes my position on that quite clear. Note that I certainly won't be altering case of the user portion of any addresses (I see no reason to alter case of any potion of the address, actually, but I realize that altering the case of the user portion is at least technically allowed to break the e-mail address).

    As for foo@asdf.c­om%bar.com, I agree that this is a valid e-mail address and can see uses for it in certain situations. But I also feel that requiring such an address format be used by your (internal or external) customers when they request that e-mail be sent to them by random external entities is a good reason to demand a new e-mail service provider. So I don't yet feel guilty about considering not allowing such addresses to be used in order to register for a service we provide on the internet (as my regex above disallows). So (at this point) I won't have a problem with comparing such addresses.

    As for a pick-and-choose module, one reaction I have is that I think it took me less than 10 minutes to cut'n'paste from two RFCs to come up with my simplistic results that seem amply permissive to real-world e-mail addresses meant to be used "at large". So I don't foresee it as particularly difficult to spend 10 minutes to pick and choose the items that fit one's specific situation. And part of my point was to wonder why people seem to never bother to cut'n'paste from the RFCs when they go to roll their own regexes.

    But I will certainly consider producing such a module (or patching an existing module, or, rather, trying to patch an existing module since CPAN so very often makes usefully patching existing modules difficult and extremely slow), especially as more details of what items are likely to be worth picking and choosing between are explained to me explicitly and clearly. I don't claim to be an expert on e-mail addresses, in fact, part of the point of posting this was my shock at how easy it appeared to be to provide something that seemed more practical (for the very common problem of validating e-mail addresses being entered by users of a web page) than what my coworker reported finding on CPAN.

    Is "do not accept email address that require an open relay to work" nothing more than "don't allow % after the @"? In any case, thanks for another justification for not using full-RFC-2822 addresses.

    Thanks also for you assessment of whether this would be a good addition to CPAN. I appreciate your opinion.

    - tye        

      tye,
      It has been 6 years since I worked at the US Dept. Of Justice and had the RFCs memorized but you can see that others agree with me. Neither Email::Valid nor Email::Address believe 'foo.@bar.com' is a valid email address and Email::Valid::Loose only exists to relax the rules of RFC 2822 to allow a period before the at.

      Regarding case sensitivity in the user portion, you did make your position clear. In fact, I indicated you had already mentioned it. I brought it up again because I believe it would be a valuable rule to turn on/off if they were using this theoretical module to identify spammers.

      The reason I suggest such a pick and choose module is thus: The specific reasons for wanting to look for email addresses and then choose to deem them invalid changes from situation to situation. Most folks are completely ignorant of the RFCs and it would be easy for them to say "in my situation, I want to allow X and Y but deny Z" without having to go look anything up.

      Cheers - L~R

        You appear to mean "directly before the @". Thanks for the clarification. Email::Valid::Loose further clarifies:

        Email::Valid::Loose is a subclass of Email::Valid, which allows . (dot) before @ (at-mark). It is invalid in RFC822, but is commonly used in some of mobile phone addresses in Japan (like docomo.ne.jp or jp-t.ne.jp).

        So the items identified so far:

        • Allow /\.\@/
        • Disallow /\@.*%/ (actually more, since you answered my question in the negative in a private /msg and promised more details later)
        • Disallow CFWS w/in the address
        • Require /\@.*\./
        • Require /\.[a-zA-Z]{2,}$/
        • Require RFC1035-compliant domain (except empty ones)
        • Extend RFC1035 to allow domain labels that start with a digit
        • Disallow quoting (usually of $local_part)
        • Disallow escaping (usually of $local_part)
        • Disallow /\+.*\@/
        • Disallow "group"s (/$display_name:$mailbox_list;/)
        • Disallow "name-addr" (/$display_name?$angle_addr/)
        • Disallow "obs*" (obs-angle-addr, obs-mbox-list, obs-addr-list, obs-local-part, obs-domain)

        - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://711132]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2014-07-13 15:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (250 votes), past polls