Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

regex to validate e-mail addresses and phone numbers

by rup173 (Novice)
on Feb 10, 2004 at 13:58 UTC ( [id://327912]=perlquestion: print w/replies, xml ) Need Help??

rup173 has asked for the wisdom of the Perl Monks concerning the following question:

hello perlmonks!

can you please help me to write a regular expression to validate emailid of a person. and even to validate mobileno .

i have written one expression can you tell wht is the error in the code given below:

if($email && ($email !~ m/^\w{2,}\.*\-*\w*\@\w{2,}(\.\w{2,4})+$/ || $e +mail =~ m/[;><&\*`\|]/))

Edited: retitled and reformatted by Chady

Replies are listed 'Best First'.
Re: regex to validate e-mail addresses and phone numbers
by b10m (Vicar) on Feb 10, 2004 at 14:05 UTC

    E-mail addressess can be slightly weird in some cases. Why not use a simple thing like Email::Valid, or create your own regex, but please base that on RFC822.

    I have no clue what phonenumbers in your country look like, so I couldn't help you there.

    --
    b10m

    All code is usually tested, but rarely trusted.
      b10m,
      While I agree with your suggestion of Email::Valid in principal, it may not be the right solution for the task at hand. I am guessing there is more to this than was originally stated. It might be best to first ask some simple questions:

    • Am I validating inbound, outbound, or bi-directional addresses?
    • Does my inbound MTA comply with the RFCs? If not is it more strict, more relaxed, or bits and pieces of both?
    • Do I care if it is valid? If it "looks" like a spammer, I want to drop it regardless.

      It may turn out that a home-grown regex is the right way to go, it may turn out that Email::Valid or Email::Valid::Loose is the way to go. It may even turn out that the best solution is SpamAssasin.

      Cheers - L~R

        Forgive my newbieness in asking, but I was taught that using a premade package like Valid is the best solution 99% of the time, and that most of the 1% was for when space/cpu and such were at issue.

        So I am wondering what would make you lean towards a home-grown solution over the module?

        Thank you in advance. ~Adam Marquis
Re: regex to validate e-mail addresses and phone numbers
by Abigail-II (Bishop) on Feb 10, 2004 at 14:25 UTC
    Regular expressions that do a syntactical validation of email addresses are not simple. They will contain thousands of bytes, and use constructs that have been marked 'experimental'.

    There are however several modules that check the correctness of email addresses.

    As for phone numbers, that's almost impossible to do. You might URIfy the phone number, and use Regexp::Common, but that most likely is just going to check whether you have a string of numbers. RFC 2806 deals with telephone URIs, but it doesn't concern itself with the validity of the number part in any existing number plan.

    Abigail

Re: regex to validate e-mail addresses and phone numbers
by hardburn (Abbot) on Feb 10, 2004 at 14:30 UTC

    The e-mail portion is answered above. For phone numbers, you need to get more information. Do you only need to validate phone numbers in your country, or for the whole planet? What form are phone numbers in your country? Do you need to handle extentions? Do you need to handle area codes? Will the user be forced to enter numbers in a certain format? Do you need to handle areas that have a four-digit exchange instead of three (US phone companies are moving that direction)? These questions (and probably some others I can't think of right now) need to be answered before a regex can be developed.

    ----
    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

Re: regex to validate e-mail addresses and phone numbers
by ChrisR (Hermit) on Feb 10, 2004 at 14:33 UTC
    While the use of a proven module is a great idea, I do believe in re-inventing the wheel at times. This way you can gain a greater understanding of what's really going on. RFC 822 will give you a complete description of what is a valid email address. Keep in mind that all valid email addresses may not be deliverable and some invalid addresses can be deliverable. I guess the most important thing is to know what you are actually validating. The Perl Cookbook has some good information in chapters 6 and 18 regarding the validation of email addresses. Or, just use a module if you want a quick fix.
Re: regex to validate e-mail addresses and phone numbers
by Rhys (Pilgrim) on Feb 10, 2004 at 16:03 UTC
    If you insist upon writing your own regex, you're going to want to pay more attention to character classes, and you want to remember that not all e-mail addresses are in the format:

    user@domain.com

    Many e-mail addresses will contain additional dots:

    user@mail.server.domain.info

    I would change the first regexp to:

    /^\w[\w\.\-]*\w\@\w[\w\.\-]*\w(\.\w{2,4})$/
    I left the parenthetical part in place, since you're apparently trying to get the top-level domain (.edu, .com, etc.) into $1, but I took off the + at the end, since it's definitely in your way. I'm not even sure what it would do in this context. I also left intact the requirement that the user and host part should begin and end with a \w character, but may contain any number of dots or dashes. The way this reads, the minimum matching string would look like:

    me@me.com

    But this would also match:

    my.big-name.sucks-big-time@mail.server-farm.long-domain.coop

    Read up on character classes. They are your friends. Anyway, the biggest obvious remaining problem (in my opinion) with this regexp is it will still allow multiple consecutive dots or dashes. This may not be a problem in the user field, but consecutive dots are not allowed in the host field. It might be simpler to write a whole 'nother regexp to look for consecutive dots or dashes and reject based on that.

      FWIW, I use: /^\w+\@([\da-zA-Z\-]{1,}\.){1,}[\da-zA-Z-]{2,6}$/ The main idea is that an "_" (underscore) matches on \w, but is not valid as part of a domain name. It is, however, acceptable as part of the username. I use {2,6} since the there are some tld's longer than 4 chars (.museum for example).


      perl -e 'print reverse qw/o b n a e s/;'
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://327912]
Approved by Chady
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2025-06-16 01:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.