JPaul has asked for the wisdom of the Perl Monks concerning the following question:
Greetings all,
I'm trying to make a regex to match eMail addresses work, which looks something like this:
if ($entry !~ /^([a-zA-Z0-9@_.]*)$/) {
For some strange reason, the regex fails to match the @ in $entry, and will fail all eMail addresses passed through it.
I assume it has something to do with the fact that @ is the array token - however if I use \@, perl appears to match the eMail address - but complains about an uninitialised value... Which as soon as I drop the @, it stops moaning.
Any bright sparks out there point out whats probably an obvious blunder?
Cheers,
JP
-- Alexander Widdlemouse undid his bellybutton and his bum dropped off --
Re: @ in regex, or not?
by arturo (Vicar) on Mar 26, 2001 at 21:13 UTC
|
@_ is indeed something perl groks natively (it's the argument list passed to a subroutine); and variables are interpolated into regular expressions. So, as written, your surmise about what's going wrong is correct; but shift those characters around (follow @ with anything else), and you probably wouldn't have seen just the problem you're seeing.
I don't see why perl would complain about an uninitialized value if the *only* change you made was escaping the @.
Now, to switch gears: this regex ain't gonna match email addresses. With the * following the character class it's just looking for ZERO OR MORE letters, digits, etc., so even "" is going to match.
Even if you change that * to a +, you're still not going to match anything like the range of valid email addresses "0", "@" all by themselves will match with that change.
There are lots of threads on this site about validating email addresses; matching the strings is a common thing people try to do in Perl, but there's no really simple way (see perlfaq9). That said, you might look into Email::Valid, Email::Find, or, if you really wanna do it with a regex, see the informative Re: pattern match e-mail addresses.
Philosophy can be made out of anything. Or less -- Jerry A. Fodor
| [reply] [d/l] |
Re: @ in regex, or not?
by tadman (Prior) on Mar 26, 2001 at 20:57 UTC
|
First, watch out for the 'dot' in your set, which Perl
interprets as "stuff" (i.e. not a linefeed ('\n'), unless
you're using crazy regexp options like /s). You probably
mean literal-dot ('\.') instead.
Secondly, I'm not sure why you're using a negative match
assertion ('!~') instead of a positive one ('=~'). It seems
to be the opposite of what you're looking for:
if ($entry =~ /^([a-zA-Z0-9\@_\.]*)$/) {
This code memorizes the entire e-mail address, which is
apparently what you intended by using the brackets in your
regexp. As a note, though, since you are memorizing the
entire thing, why bother to do this instead of just using
$entry?
Note:
On the subject of e-mail address matching regexps, you will
have to be more open-minded about what can appear in these.
For example, many characters other than the ones you specified
are actually valid in the e-mail address part of the name. I
would modify it so that the checks on the e-mail address
part are more liberal, and further, that instead of using
the star operator (0 or more), which has the unfortunate
effect of validating a zero-length string(!), that I would
demand at least one character on each side.
if ($entry =~ /^([a-zA-Z0-9_\.\-\!\%\#]+\@[a-zA-Z0-9_\.]+)$/) {
Here's some e-mail addresses which could be used to test
any modifications:
qw [
abc@123.it
a@b
tech-support@super.net
user_144@z.com
webmaster@estherdyson.museum
];
Tip:
It is probably a good practice to "escape" all non-alphanumeric
characters in your regexps until you know which ones are
safe. As you found out, a seemingly inert '@' was interpreted
otherwise, and the unassuming '.' means a whole lot more
than just dot inside a regexp. | [reply] [d/l] [select] |
|
Important correction: outside of a character class, . means "match anything" (except a newline, unless told otherwise), but inside a character class, it DOES mean a literal dot.
Philosophy can be made out of anything. Or less -- Jerry A. Fodor
| [reply] |
|
Strange but true. It is odd, though, that '@' is parsed in
there as meaning array, but the previously all-powerful '.' reverts back to meaning just a dot. Like Kryptonite does
to Superman?
| [reply] |
|
|
Re: @ in regex, or not?
by Clownburner (Monk) on Mar 26, 2001 at 21:57 UTC
|
This is a lot harder than it looks, if you have to pass every possible RFC-valid Email address. The Mastering Regular Expressions book has the most complete example I know of, but it's a little over 5k in size.
One thing to consider here is what your target audience is likely to enter as an email address, and think about how precise to need to be. If it's an intranet application and everyone's email addresses are in a simple format (foo@barr.com) then a simple Regexp is likely the trick, but if you may need to support international and unusual (UUCP gateway, anyone?) email addresses, you might go to something more elaborate, like the MRE method.
You can download the email regexp here: http://public.yahoo.com/~jfriedl/regex/code.html
HTH!
Signature void where prohibited by law. | [reply] |
Re: @ in regex, or not?
by the_slycer (Chaplain) on Mar 26, 2001 at 20:56 UTC
|
I think that $entry may not be getting initialized properly.
I tested your regex with the \@ and it appears to work fine.
eg:
$entry = 'a.address@host.com';
if ($entry !~ /^([a-zA-Z0-9\@_.]*)$/) {
print "Doh!\n";
exit;
}
print "Found it\n";
prints "Found it!"
| [reply] [d/l] |
Re: @ in regex, or not?
by Beatnik (Parson) on Mar 27, 2001 at 14:29 UTC
|
Like mentioned above and in perlfaq9 and MRE, there is no foolproof way to check for valid email addresses...
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur. | [reply] |
|
I'm sure I've seen a huge (several hundreds lines) regexp RFC 822 compliant for checking mail here in the monastery.
(But despite my search I couldn't manage to find it again...)
But thanks to Clownburner, I will know be able to recreate it...
But people usually use Email::Valid, this combined with some MX check/hack (EXPN/CHCK/RCPT TO check)
could produce a pretty good email checking
"Trying to be a SMART lamer" (thanx to Merlyn ;-)
| [reply] |
|
ObCPAN: One good module for this is Mail::Address, found in Graham Barr's Mailtools package.
One point worth noting is that whilst not accepting a valid RFC822 address might be considered rude, if you are going to be doing any validation on the address (for example, only allow the script to send email to a particular domain) you may *want* to only match simple user@domain addresses.
For example, you may not want the address luser%victim.com@your.innocent.domain to pass your check and allow someone to send email to luser@victim.com, but if you allow full RFC822 addresses through then that is the kind of problem you might have. (That and the fact that some attacks try to embed shell command sequences in addresses).
So if the data is from a tainted source, its probably worth doing some aggressive sanitisation prior to sending any email.
| [reply] |
|
Those routines are probably 99.99% perfect, but I doubt they're foolproof :)
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur.
| [reply] |
Re: @ in regex, or not?
by alfie (Pilgrim) on Mar 26, 2001 at 21:17 UTC
|
Additionally to what the former comments say I would
sugguest you to make the match a lot more foolproof by
making it more restrictive:
m/^((\w\d_\+\-\.)+\@(\w\d)(\w\d\.\-)+\.(\w){2,3})$/
This should be able to catch quite alot, you can always
expand (or make it even more restrictive) to your needs.
--
Alfie | [reply] [d/l] |
Re: @ in regex, or not?
by Daddio (Chaplain) on Mar 27, 2001 at 08:27 UTC
|
While not being quite as restrictive as some of the examples above, this should give you good, basic 'a@b'-type email address checking:
if ($entry =~ /[^\@]+\@[^\@]+/) {
Add more characters to the negated classes to make it more restrictive, or leave it as is for the basic check.
| [reply] [d/l] |
Re: @ in regex, or not?
by merlyn (Sage) on Mar 27, 2001 at 22:18 UTC
|
| [reply] |
|
|