Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Practical e-mail address validation (flex)

by tye (Sage)
on Sep 13, 2008 at 17:12 UTC ( [id://711144]=note: print w/replies, xml ) Need Help??


in reply to Re: Practical e-mail address validation
in thread Practical e-mail address validation

Having just skimmed the parts of RFC 2822 regarding e-mail addresses, it was pretty clear that "accept email address that have a period before @ in violation of the RFC" isn't based on the copy that I found. It clearly went to some length to document how you can use period before the @ so I'd be quite to surprised to find that some other part of the RFC disallowed such usage.

As for ignoring case before the @, I wrote above:

Yes, I realize that it is possible to set up a system such that ExpertsExc­hange@exam­ple.com and ExpertSexc­hange@exam­ple.com are completely separate addresses, but anybody who does that deserves to suffer from such a set-up.

which I think makes my position on that quite clear. Note that I certainly won't be altering case of the user portion of any addresses (I see no reason to alter case of any potion of the address, actually, but I realize that altering the case of the user portion is at least technically allowed to break the e-mail address).

As for foo@asdf.c­om%bar.com, I agree that this is a valid e-mail address and can see uses for it in certain situations. But I also feel that requiring such an address format be used by your (internal or external) customers when they request that e-mail be sent to them by random external entities is a good reason to demand a new e-mail service provider. So I don't yet feel guilty about considering not allowing such addresses to be used in order to register for a service we provide on the internet (as my regex above disallows). So (at this point) I won't have a problem with comparing such addresses.

As for a pick-and-choose module, one reaction I have is that I think it took me less than 10 minutes to cut'n'paste from two RFCs to come up with my simplistic results that seem amply permissive to real-world e-mail addresses meant to be used "at large". So I don't foresee it as particularly difficult to spend 10 minutes to pick and choose the items that fit one's specific situation. And part of my point was to wonder why people seem to never bother to cut'n'paste from the RFCs when they go to roll their own regexes.

But I will certainly consider producing such a module (or patching an existing module, or, rather, trying to patch an existing module since CPAN so very often makes usefully patching existing modules difficult and extremely slow), especially as more details of what items are likely to be worth picking and choosing between are explained to me explicitly and clearly. I don't claim to be an expert on e-mail addresses, in fact, part of the point of posting this was my shock at how easy it appeared to be to provide something that seemed more practical (for the very common problem of validating e-mail addresses being entered by users of a web page) than what my coworker reported finding on CPAN.

Is "do not accept email address that require an open relay to work" nothing more than "don't allow % after the @"? In any case, thanks for another justification for not using full-RFC-2822 addresses.

Thanks also for you assessment of whether this would be a good addition to CPAN. I appreciate your opinion.

- tye        

Replies are listed 'Best First'.
Re^3: Practical e-mail address validation (flex)
by Limbic~Region (Chancellor) on Sep 13, 2008 at 17:42 UTC
    tye,
    It has been 6 years since I worked at the US Dept. Of Justice and had the RFCs memorized but you can see that others agree with me. Neither Email::Valid nor Email::Address believe 'foo.@bar.com' is a valid email address and Email::Valid::Loose only exists to relax the rules of RFC 2822 to allow a period before the at.

    Regarding case sensitivity in the user portion, you did make your position clear. In fact, I indicated you had already mentioned it. I brought it up again because I believe it would be a valuable rule to turn on/off if they were using this theoretical module to identify spammers.

    The reason I suggest such a pick and choose module is thus: The specific reasons for wanting to look for email addresses and then choose to deem them invalid changes from situation to situation. Most folks are completely ignorant of the RFCs and it would be easy for them to say "in my situation, I want to allow X and Y but deny Z" without having to go look anything up.

    Cheers - L~R

      You appear to mean "directly before the @". Thanks for the clarification. Email::Valid::Loose further clarifies:

      Email::Valid::Loose is a subclass of Email::Valid, which allows . (dot) before @ (at-mark). It is invalid in RFC822, but is commonly used in some of mobile phone addresses in Japan (like docomo.ne.jp or jp-t.ne.jp).

      So the items identified so far:

      • Allow /\.\@/
      • Disallow /\@.*%/ (actually more, since you answered my question in the negative in a private /msg and promised more details later)
      • Disallow CFWS w/in the address
      • Require /\@.*\./
      • Require /\.[a-zA-Z]{2,}$/
      • Require RFC1035-compliant domain (except empty ones)
      • Extend RFC1035 to allow domain labels that start with a digit
      • Disallow quoting (usually of $local_part)
      • Disallow escaping (usually of $local_part)
      • Disallow /\+.*\@/
      • Disallow "group"s (/$display_name:$mailbox_list;/)
      • Disallow "name-addr" (/$display_name?$angle_addr/)
      • Disallow "obs*" (obs-angle-addr, obs-mbox-list, obs-addr-list, obs-local-part, obs-domain)

      - tye        

        tye,
        They (email addresses with periods immediately preceding the @) were also very common by Microsoft Exchange back when I was working at the DoJ. I am not sure if M$ has become more compliant. I am going to be updating this node with a variety of other ways at attempting to exploit open relays and I will /msg you when complete.

        Update: Rather than enumerate them myself, go to http://www.abuse.net/relay.html and test an MTA you believe to be secure. It shows you all the email addresses it uses to test with (from and to). I also realized I had the relay syntax wrong. It is foo%bar.com@example.com. I have updated the prior node.

        Update: I haven't provided a complete list of "rules" that I think such a theoretical module should include but having "John Smith"@example.com is another one that should be flexible. If I come up with more I will add them here but it has been a long time since I thought about such things. Oh, and I used to have to worry about non-SMTP addresses too like CC:Mail and GroupWise (fortunately not UUCP).

        Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711144]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2024-03-28 09:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found