Re: regex to validate e-mail addresses and phone numbers
by b10m (Vicar) on Feb 10, 2004 at 14:05 UTC
|
E-mail addressess can be slightly weird in some cases. Why not use a simple thing like Email::Valid, or create your own regex, but please base that on RFC822.
I have no clue what phonenumbers in your country look like, so I couldn't help you there.
--
b10m
All code is usually tested, but rarely trusted.
| [reply] |
|
| [reply] |
|
Forgive my newbieness in asking, but I was taught that using a premade package like Valid is the best solution 99% of the time, and that most of the 1% was for when space/cpu and such were at issue.
So I am wondering what would make you lean towards a home-grown solution over the module?
Thank you in advance. ~Adam Marquis
| [reply] |
|
|
Re: regex to validate e-mail addresses and phone numbers
by Abigail-II (Bishop) on Feb 10, 2004 at 14:25 UTC
|
Regular expressions that do a syntactical validation of
email addresses are not simple. They will contain thousands
of bytes, and use constructs that have been marked 'experimental'.
There are however several modules that check the correctness
of email addresses.
As for phone numbers, that's almost impossible to do. You
might URIfy the phone number, and use Regexp::Common,
but that most likely is just going to check whether you have
a string of numbers. RFC 2806 deals with telephone URIs,
but it doesn't concern itself with the validity of the
number part in any existing number plan.
Abigail | [reply] |
Re: regex to validate e-mail addresses and phone numbers
by hardburn (Abbot) on Feb 10, 2004 at 14:30 UTC
|
The e-mail portion is answered above. For phone numbers, you need to get more information. Do you only need to validate phone numbers in your country, or for the whole planet? What form are phone numbers in your country? Do you need to handle extentions? Do you need to handle area codes? Will the user be forced to enter numbers in a certain format? Do you need to handle areas that have a four-digit exchange instead of three (US phone companies are moving that direction)? These questions (and probably some others I can't think of right now) need to be answered before a regex can be developed.
----
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [d/l] |
Re: regex to validate e-mail addresses and phone numbers
by ChrisR (Hermit) on Feb 10, 2004 at 14:33 UTC
|
While the use of a proven module is a great idea, I do believe in re-inventing the wheel at times. This way you can gain a greater understanding of what's really going on. RFC 822 will give you a complete description of what is a valid email address. Keep in mind that all valid email addresses may not be deliverable and some invalid addresses can be deliverable. I guess the most important thing is to know what you are actually validating. The Perl Cookbook has some good information in chapters 6 and 18 regarding the validation of email addresses. Or, just use a module if you want a quick fix. | [reply] |
Re: regex to validate e-mail addresses and phone numbers
by Rhys (Pilgrim) on Feb 10, 2004 at 16:03 UTC
|
If you insist upon writing your own regex, you're going to want to pay more attention to character classes, and you want to remember that not all e-mail addresses are in the format:
user@domain.com
Many e-mail addresses will contain additional dots:
user@mail.server.domain.info
I would change the first regexp to:
/^\w[\w\.\-]*\w\@\w[\w\.\-]*\w(\.\w{2,4})$/
I left the parenthetical part in place, since you're apparently trying to get the top-level domain (.edu, .com, etc.) into $1, but I took off the + at the end, since it's definitely in your way. I'm not even sure what it would do in this context. I also left intact the requirement that the user and host part should begin and end with a \w character, but may contain any number of dots or dashes. The way this reads, the minimum matching string would look like:
me@me.com
But this would also match:
my.big-name.sucks-big-time@mail.server-farm.long-domain.coop
Read up on character classes. They are your friends. Anyway, the biggest obvious remaining problem (in my opinion) with this regexp is it will still allow multiple consecutive dots or dashes. This may not be a problem in the user field, but consecutive dots are not allowed in the host field. It might be simpler to write a whole 'nother regexp to look for consecutive dots or dashes and reject based on that. | [reply] [d/l] |
|
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |