Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Regex anchor speed ?: Is /^.*X/ faster than /X/ ?

by cshirky (Initiate)
on Nov 08, 2000 at 20:33 UTC ( #40546=perlquestion: print w/replies, xml ) Need Help??
cshirky has asked for the wisdom of the Perl Monks concerning the following question:

I am looping through lotsa data, looking for lines that contain email addresses.

The current match is

if ( /\@/ ) { &Frobnitz($_); }
Would front anchoring a single char match, as in
if ( /^.*\@/ ) { &Frobnitz($_); }
be faster?

Tusen tak,


Replies are listed 'Best First'.
Re: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
by Fastolfe (Vicar) on Nov 08, 2000 at 21:08 UTC
    I benchmarked it:

    The string: "now is the time for all good monks to come to the aid of their perl". The results:

    Benchmark: timing 5000000 iterations of /.*perl/, /^.*perl/, /perl/... /.*perl/: 43 secs (41.48 usr 0.00 sys = 41.48 cpu) /^.*perl/: 38 secs (36.20 usr 0.00 sys = 36.20 cpu) /perl/: 29 secs (28.96 usr 0.00 sys = 28.96 cpu)
    So yah, a standalone /match/ is going to be a bit faster.
Re: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
by Ovid (Cardinal) on Nov 08, 2000 at 23:16 UTC
    Actually, you aren't matching e-mail addresses. You are simply checking for the existence of the @ symbol. If you know the only lines that contain an @ symbol are e-mail addresses, you're fine.

    Fastolfe is correct that a straight match will be faster. The /^.*\@/ is forced to match to the end of the string (or to a newline) and then backtrack to the @ symbol. This is extra overhead and will slow it down.

    If you use minimal matching with /^.*?\@/, the regex matches every character (except for newline and end-of-string) and then looks ahead for the @ symbol. Again, you have extra overhead.

    A simple scan for the @ symbol (/\@/) just looks for the @ symbol and returns true if found. That is the fastest way to scan for the character. See Death to Dot Star! if you wish to understand more about the dot metacharacters in regexes.

      Ovid said:
      > The /^.*\@/ is forced to match to
      > the end of the string (or to a newline) and then backtrack to the @ symbol.
      But that is not exactly true. In general, Perl does behave that way. But in some cases, such as this one, there is an optimization: Perl sees that the string can't match unless it contains a @ character, so it looks for the @ first, and works outwards from there. In particular, it does not let .* match all the way to the end of the string and then backtrack it; it gets the right length for .* on the first try.

      Isn't that interesting?

      The nongreedy .*? version is optimized similarly, so I would be surprised if it performed any differently in this example.

Re: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
by extremely (Priest) on Nov 09, 2000 at 16:34 UTC
    I'd think about using:  &Frobnitz($_) if index $_, '@'; instead. I got this:
            Rate regexp  index
    regexp 826/s     --   -10%
    index  921/s    11%     --

    with Benchmark. Why throw the regex engine at a single character when there is lovely little function just for that? =)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://40546]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2017-04-27 02:15 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (498 votes). Check out past polls.