Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Perl: the Markov chain saw
 
PerlMonks  

Re: Analyzing regular expression performance

by kvale (Monsignor)
on May 12, 2006 at 11:06 UTC ( [id://549026]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to Analyzing regular expression performance

One of the best cures for slow regexes is to improve your understanding of how regexes work. I suggest perlretut as a starter and "Mastering Regular Expressions" published by O'Reilly.

A common problem causing exponential blowup of execution time is nested quantifiers:

(a*)+ (a+)+ (a*)* (a+)* etc.
There are an exponential number of ways to split up a string of a's into inner and outer quantifier bits, and if the string to match has stuff past that string of a's that can't match, it will try every possibility in the effort to match. Hence the blowup. The solution is to rewrite your regexes to not have nested quantifiers.

Update: There does exist a directive for debugging regexes (see use re 'debug' in perlretut), but it won't make sense until you have a good understanding of how regexes work.

-Mark

Replies are listed 'Best First'.
Re^2: Analyzing regular expression performance
by kwaping (Priest) on May 12, 2006 at 12:20 UTC
    That would definitely explain at least some of it. The $sp and $words regexes posted above have that structure.

    ---
    It's all fine and dandy until someone has to look at the code.
      Changing the $sp regex to (?:\s*) cut runtime to almost nothing, so it looks like that's where my problem is.

      I do need to allow comments anywhere in the string, though -- including having comments right after comments. Is there a cleaner way of doing that?

      ---
      A fair fight is a sign of poor planning.

        I do think you are insane to parse SQL with regular expressions, but if you are brave and insist on this approach, I would suggest a multi-pass approach.

        Write one pattern that strips out comments. Write another pattern to coalesce blanks into a single blank. Then you will find that writing a third pattern to pick apart an expression becomes much easier. At least you won't need so many nested 0-or-many groupings -- that is what is killing your performance.

        I tried to run your code and it ate up all available memory... I do no think this approach will scale.

        • another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://549026]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.