Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: Practical e-mail address validation (Email::Address)

by tye (Sage)
on Sep 13, 2008 at 16:35 UTC ( #711134=note: print w/replies, xml ) Need Help??


in reply to Re: Practical e-mail address validation
in thread Practical e-mail address validation

I've seen domains where foo.com was used in e-mail addresses for employees while foo.net was used in e-mail addresses of customers.

As for the "+mailbox" convention, there are arguments on both sides of whether to ignore such in determining equivalence of addresses (a customer might legitimately want separate accounts for members of a single group where the correspondence for all accounts just go to separate mailboxes at the same address, or it might just be a source of confusion or simplify some mild cases of abuse).

But we'll be using +mailbox to simplify testing so we'll just use lc $addr1 eq lc $addr2 as I already noted.

Thanks for the module recommendations. Email::Address notes:

XXX: This ($phrase) used to just be: my $phrase = qr/$word+/; It was changed to resolve bug 22991, creating a significant slowdown. Given current speed problems. Once 16320 is resolved, this section should be dealt with. -- rjbs, 2006-11-11
XXX: ...and the above solution caused endless problems (never returned) when examining this address, now in a test:
admin+=E6=96=B0=E5=8A=A0=E5=9D=A1_Weblog-- ATAT --test.socialtext.com
So we disallow the hateful CFWS in this context for now. Of modern mail agents, only Apple Web Mail 2.0 is known to produce obs-phrase. -- rjbs, 2006-11-19

which confirms some of my suspicions/assumptions.

Looking at the regexes that the module uses, they appear to have been constructed directly from the RFCs very similarly to how I constructed mine, except fewer features were intentionally dropped.

The note that "Of modern mail agents, only ... is known to produce" leads me to want to use that module if I were trying to parse e-mail addresses received in e-mail messages. An e-mail system would be broken if it required "the hateful CFWS" in order to deliver messages to it. So completely disallowing CFWS (as I did) doesn't prevent any addresses from being used.

The module doesn't appear to provide a way to get the address with quoting and escaping removed so that addresses can be compared. It also doesn't disallow the very common user mistake of "everybody@gmail" (which can be valid as an e-mail address in some situations but isn't a valid address to give to somebody outside of your organization and so is worthwhile for us to disallow).

So it appears that my similar regex has several advantages that I couldn't get from Email::Address as written.

- tye        

Replies are listed 'Best First'.
Re^3: Practical e-mail address validation (Email::Address)
by everybody (Scribe) on Sep 13, 2008 at 20:16 UTC
    The module doesn't appear to provide a way to get the address with quoting and escaping removed so that addresses can be compared.
    The following snippet does just that:
    use Email::Address; my @addresses = map { Email::Address->parse($_) } <DATA>; print is_equivalent(@addresses) ? '' : 'not ' , "equivalent\n"; sub is_equivalent { my ($a, $b) = @_; return lc $a->address eq lc $b->address; } __DATA__ "John Doe" <jdoe@bla.com> (Johnnie "Two Toes") jDOE@BLA.COM
    It also doesn't disallow the very common user mistake of "everybody@gmail" (which can be valid as an e-mail address in some situations but isn't a valid address to give to somebody outside of your organization and so is worthwhile for us to disallow).
    That's right. Any validation rules you want to impose beyond what RFC 2822 does is up to you, but Email::Address will tokenise the addresses in order to enable you to. Here's a snippet of how it deals with various address formats:
    use Email::Address; my @addresses = map { Email::Address->parse( $_ ) } <DATA>; for my $address (@addresses) { printf("%8s: %s\n", $_, ( $address->$_ or '' ) ) for ( qw( origina +l address user host name phrase comment format ) ); print "-------\n"; } __DATA__ abc@foo.com bla@gmail "Eve Rybody" <everybody@example.com> foo@asdf.com%bar.com "Alan B. Combs" <abc@foo.com> (I can't think of anything complex)
    From there you can easily validate $username, $host etc.

      I don't see how your proposed solution does what I requested. It doesn't do any canonicalization of quotes nor escapes. Yes, it eliminates comments. But your test considers jdoe@bla.com to be different from "jdoe"@bla.com, for example. Perhaps there are some other "tokens" I should be asking for instead?

      Yes, I suppose I could go to the trouble of parsing RFC 2822 (except for not allowing abitrary nesting of comments and not allowing CFWS in many places) in order to throw away the parts I don't want and then reparse the other parts to do additional validation and canonicalization. I suspect that will be more code (not counting the code for the module) than the solution I've already written. And it won't allow for simple customization such as allowing /\.@/ as noted elsewhere.

      The module almost does RFC 2822 but doesn't do a good job of practical validation of e-mail addresses typed in by external users. I don't see its contribution to my task being much of a "win".

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711134]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (1)
As of 2021-10-18 04:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My first memorable Perl project was:







    Results (72 votes). Check out past polls.

    Notices?