Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

@ in regex, or not?

by JPaul (Hermit)
on Mar 26, 2001 at 20:50 UTC ( #67232=perlquestion: print w/replies, xml ) Need Help??

JPaul has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all,

I'm trying to make a regex to match eMail addresses work, which looks something like this:

if ($entry !~ /^([a-zA-Z0-9@_.]*)$/) {
For some strange reason, the regex fails to match the @ in $entry, and will fail all eMail addresses passed through it.
I assume it has something to do with the fact that @ is the array token - however if I use \@, perl appears to match the eMail address - but complains about an uninitialised value... Which as soon as I drop the @, it stops moaning.

Any bright sparks out there point out whats probably an obvious blunder?

Cheers,
JP

-- Alexander Widdlemouse undid his bellybutton and his bum dropped off --

Replies are listed 'Best First'.
Re: @ in regex, or not?
by arturo (Vicar) on Mar 26, 2001 at 21:13 UTC

    @_ is indeed something perl groks natively (it's the argument list passed to a subroutine); and variables are interpolated into regular expressions. So, as written, your surmise about what's going wrong is correct; but shift those characters around (follow @ with anything else), and you probably wouldn't have seen just the problem you're seeing.

    I don't see why perl would complain about an uninitialized value if the *only* change you made was escaping the @.

    Now, to switch gears: this regex ain't gonna match email addresses. With the * following the character class it's just looking for ZERO OR MORE letters, digits, etc., so even "" is going to match.

    Even if you change that * to a +, you're still not going to match anything like the range of valid email addresses "0", "@" all by themselves will match with that change.

    There are lots of threads on this site about validating email addresses; matching the strings is a common thing people try to do in Perl, but there's no really simple way (see perlfaq9). That said, you might look into Email::Valid, Email::Find, or, if you really wanna do it with a regex, see the informative Re: pattern match e-mail addresses.

    Philosophy can be made out of anything. Or less -- Jerry A. Fodor

Re: @ in regex, or not?
by tadman (Prior) on Mar 26, 2001 at 20:57 UTC
    First, watch out for the 'dot' in your set, which Perl interprets as "stuff" (i.e. not a linefeed ('\n'), unless you're using crazy regexp options like /s). You probably mean literal-dot ('\.') instead.

    Secondly, I'm not sure why you're using a negative match assertion ('!~') instead of a positive one ('=~'). It seems to be the opposite of what you're looking for:      if ($entry =~ /^([a-zA-Z0-9\@_\.]*)$/) { This code memorizes the entire e-mail address, which is apparently what you intended by using the brackets in your regexp. As a note, though, since you are memorizing the entire thing, why bother to do this instead of just using $entry?

    Note: On the subject of e-mail address matching regexps, you will have to be more open-minded about what can appear in these. For example, many characters other than the ones you specified are actually valid in the e-mail address part of the name. I would modify it so that the checks on the e-mail address part are more liberal, and further, that instead of using the star operator (0 or more), which has the unfortunate effect of validating a zero-length string(!), that I would demand at least one character on each side.     if ($entry =~ /^([a-zA-Z0-9_\.\-\!\%\#]+\@[a-zA-Z0-9_\.]+)$/) { Here's some e-mail addresses which could be used to test any modifications:
    qw [ abc@123.it a@b tech-support@super.net user_144@z.com webmaster@estherdyson.museum ];
    Tip: It is probably a good practice to "escape" all non-alphanumeric characters in your regexps until you know which ones are safe. As you found out, a seemingly inert '@' was interpreted otherwise, and the unassuming '.' means a whole lot more than just dot inside a regexp.

      Important correction: outside of a character class, . means "match anything" (except a newline, unless told otherwise), but inside a character class, it DOES mean a literal dot.

      Philosophy can be made out of anything. Or less -- Jerry A. Fodor

        Strange but true. It is odd, though, that '@' is parsed in there as meaning array, but the previously all-powerful '.' reverts back to meaning just a dot. Like Kryptonite does to Superman?
Re: @ in regex, or not?
by Clownburner (Monk) on Mar 26, 2001 at 21:57 UTC
    This is a lot harder than it looks, if you have to pass every possible RFC-valid Email address. The Mastering Regular Expressions book has the most complete example I know of, but it's a little over 5k in size.

    One thing to consider here is what your target audience is likely to enter as an email address, and think about how precise to need to be. If it's an intranet application and everyone's email addresses are in a simple format (foo@barr.com) then a simple Regexp is likely the trick, but if you may need to support international and unusual (UUCP gateway, anyone?) email addresses, you might go to something more elaborate, like the MRE method.

    You can download the email regexp here: http://public.yahoo.com/~jfriedl/regex/code.html

    HTH!
    Signature void where prohibited by law.
Re: @ in regex, or not?
by the_slycer (Chaplain) on Mar 26, 2001 at 20:56 UTC
    I think that $entry may not be getting initialized properly. I tested your regex with the \@ and it appears to work fine. eg:
    $entry = 'a.address@host.com'; if ($entry !~ /^([a-zA-Z0-9\@_.]*)$/) { print "Doh!\n"; exit; } print "Found it\n";
    prints "Found it!"
Re: @ in regex, or not?
by Beatnik (Parson) on Mar 27, 2001 at 14:29 UTC
    Like mentioned above and in perlfaq9 and MRE, there is no foolproof way to check for valid email addresses...

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
      I'm sure I've seen a huge (several hundreds lines) regexp RFC 822 compliant for checking mail here in the monastery.
      (But despite my search I couldn't manage to find it again...)
      But thanks to Clownburner, I will know be able to recreate it...

      But people usually use Email::Valid, this combined with some MX check/hack (EXPN/CHCK/RCPT TO check)
      could produce a pretty good email checking


      "Trying to be a SMART lamer" (thanx to Merlyn ;-)

        ObCPAN: One good module for this is Mail::Address, found in Graham Barr's Mailtools package.

        One point worth noting is that whilst not accepting a valid RFC822 address might be considered rude, if you are going to be doing any validation on the address (for example, only allow the script to send email to a particular domain) you may *want* to only match simple user@domain addresses.

        For example, you may not want the address luser%victim.com@your.innocent.domain to pass your check and allow someone to send email to luser@victim.com, but if you allow full RFC822 addresses through then that is the kind of problem you might have. (That and the fact that some attacks try to embed shell command sequences in addresses).

        So if the data is from a tainted source, its probably worth doing some aggressive sanitisation prior to sending any email.

        Those routines are probably 99.99% perfect, but I doubt they're foolproof :)

        Greetz
        Beatnik
        ... Quidquid perl dictum sit, altum viditur.
Re: @ in regex, or not?
by alfie (Pilgrim) on Mar 26, 2001 at 21:17 UTC
    Additionally to what the former comments say I would sugguest you to make the match a lot more foolproof by making it more restrictive:
    m/^((\w\d_\+\-\.)+\@(\w\d)(\w\d\.\-)+\.(\w){2,3})$/
    This should be able to catch quite alot, you can always expand (or make it even more restrictive) to your needs.
    --
    Alfie
Re: @ in regex, or not?
by Daddio (Chaplain) on Mar 27, 2001 at 08:27 UTC
    While not being quite as restrictive as some of the examples above, this should give you good, basic 'a@b'-type email address checking:

    if ($entry =~ /[^\@]+\@[^\@]+/) {

    Add more characters to the negated classes to make it more restrictive, or leave it as is for the basic check.

Re: @ in regex, or not?
by merlyn (Sage) on Mar 27, 2001 at 22:18 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://67232]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2022-06-27 21:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (88 votes). Check out past polls.

    Notices?