Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Regex being stupid

by AI Cowboy (Sexton)
on Sep 19, 2013 at 02:50 UTC ( #1054760=perlquestion: print w/replies, xml ) Need Help??
AI Cowboy has asked for the wisdom of the Perl Monks concerning the following question:

$text = ' type add (#090930-230011-907000 ur22122021 + pi'; $text =~ s/.*\s(\w+)@(\w+)\.com .*/$1@$2\.(\w+)/;

Say, for instance, this were to happen (a variable has a weird-ass value, and you try to parse an email address out of it). Why isn't the regex extracting the email address from this string? I can't find it out.

EDIT: By the by, this code fragment is not what is used. The regex is used, but the variable is initialized through a large process in my code that I can't really release on here.

EDIT 2: Thanks to some of the help on this thread, I was able to get the proper regex, $text =~ /(\w+)@(\w+)\.com/;. Thanks to everyone for trying to help :)

Replies are listed 'Best First'.
Re: Regex being stupid
by davido (Archbishop) on Sep 19, 2013 at 03:06 UTC

    I didn't bother reading past "\s" in your regex. I don't see a space character appearing anywhere before the "@" in your target string, so of course it can't match.

    Why do you say the regex is being stupid, when it's just doing what you're telling it to do (ie, reject matches on strings that don't have a space character before a word character before an "@" character)?


Re: Regex being stupid
by Athanasius (Chancellor) on Sep 19, 2013 at 03:10 UTC

    The code fragment shown has a number of problems. First, the string used to initialise $text contains the “@” symbol, which triggers interpolation within double quotes. This needs to be escaped, or else change to single quotes.

    Second, the replacement part of a substitution cannot contain assertions such as \w+. Third, as davido says, the regex will fail to match because it requires a whitespace character (\s) before the first capture, but the given string does not contain any whitespace in that position.

    Try this as a start:

    #! perl use strict; use warnings; my $text = ' type add (#090930-230011-907000 ur22122 +021 pi'; $text =~ /(\w+)@(\w+)\.com/; print "$1\@$2\n";


    12:59 >perl pilar@delphoss 13:06 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Looks good, and I didn't know you can't use \w in the replacement part of s/. Also, the code fragment may have errors inside the quotes itself (I updated it however - thanks for pointing that out), but the quotes and the way the variable is set is not what actually happens in my code, it was just an attempt at speeding the question along to ask, basically, "you have a weird variable, grab email address nao".

      Many thanks for your help, I will give it a try :)

Re: Regex being stupid
by CountZero (Bishop) on Sep 19, 2013 at 06:19 UTC
    Have a look at Email::Find.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Regex being stupid
by marinersk (Curate) on Sep 19, 2013 at 06:41 UTC
    Hello again AI Cowboy,

    [...] the variable is initialized through a large process in my code that I can't really release on here

    And we would not want it here; part of making an effective question here, which encourages effective answers, is to only show a code snippet that demonstrates the issue.

    This you have done; and please realize that people now picking nits at the sample code are still trying to help you, even though the specific problem you raise isn't there.

    All part of the Monastery way. :-)

Re: Regex being stupid
by boftx (Deacon) on Sep 19, 2013 at 07:07 UTC

    I'm really too tired to take a crack at this (being awake only because of a roll-out that happened at 00:00) but if it can be guaranteed that the potential email address will always be the first element before a whitespace in a give input line then you might try this:

    my ($email,$foo) = split(/\s/,$text,2);

    You could then use any number of tools to see if $email is actually a potentially reasonable representation of an email address.

Re: Regex being stupid
by dave_the_m (Prior) on Sep 19, 2013 at 07:21 UTC
    In addition to what others have pointed out, the '@' in the regex needs escaping. @( is a valid variable name in perl (that happens to have zero elements in it normally).


      While it is true that @( is a valid variable name, it doesn’t interpolate:
      @( = (42, 12); # valid statement print "@("; # doesn’t print '4212' but '@(' print '@('; # same as above (obviously)
      Thus, the @ in the regex doesn’t need escaping, strictly speaking. I’d still recommend escaping it, for clarity and maintainability.
Re: Regex being stupid
by Laurent_R (Abbot) on Sep 19, 2013 at 09:52 UTC

    Assuming (just my guess) that what you wanted to do with the final (\w+) was to capture the end of the address, you could use this:

    $text =~ s/(\w+)\@(\w+)\.([\w.]+)/$1@$2.$3/;