Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regex anchor speed ?: Is /^.*X/ faster than /X/ ?

( #40546=categorized question: print w/ replies, xml ) Need Help??
Contributed by cshirky on Nov 08, 2000 at 20:33 UTC
Q&A  > regular expressions


Description:

I am looping through lotsa data, looking for lines that contain email addresses. The current match is if ( /\@/ ) { &Frobnitz($_); } Would front anchoring a single char match, as in if ( /^.*\@/ ) { &Frobnitz($_); } be faster? Tusen tak, -clay

Answer: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
contributed by Fastolfe

I benchmarked it:

The string: "now is the time for all good monks to come to the aid of their perl". The results:

Benchmark: timing 5000000 iterations of /.*perl/, /^.*perl/, /perl/... /.*perl/: 43 secs (41.48 usr 0.00 sys = 41.48 cpu) /^.*perl/: 38 secs (36.20 usr 0.00 sys = 36.20 cpu) /perl/: 29 secs (28.96 usr 0.00 sys = 28.96 cpu)
So yah, a standalone /match/ is going to be a bit faster.
Answer: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
contributed by Ovid

Actually, you aren't matching e-mail addresses. You are simply checking for the existence of the @ symbol. If you know the only lines that contain an @ symbol are e-mail addresses, you're fine.

Fastolfe is correct that a straight match will be faster. The /^.*\@/ is forced to match to the end of the string (or to a newline) and then backtrack to the @ symbol. This is extra overhead and will slow it down.

If you use minimal matching with /^.*?\@/, the regex matches every character (except for newline and end-of-string) and then looks ahead for the @ symbol. Again, you have extra overhead.

A simple scan for the @ symbol (/\@/) just looks for the @ symbol and returns true if found. That is the fastest way to scan for the character. See Death to Dot Star! if you wish to understand more about the dot metacharacters in regexes.

Answer: Regex anchor speed ?: Is /^.*X/ faster than /X/ ?
contributed by extremely

I'd think about using:  &Frobnitz($_) if index $_, '@'; instead. I got this:

        Rate regexp  index
regexp 826/s     --   -10%
index  921/s    11%     --

with Benchmark. Why throw the regex engine at a single character when there is lovely little function just for that? =)

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (6)
    As of 2014-08-30 13:20 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (293 votes), past polls