Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Analyzing regular expression performance

by kvale (Monsignor)
on May 12, 2006 at 15:06 UTC ( #549026=note: print w/ replies, xml ) Need Help??


in reply to Analyzing regular expression performance

One of the best cures for slow regexes is to improve your understanding of how regexes work. I suggest perlretut as a starter and "Mastering Regular Expressions" published by O'Reilly.

A common problem causing exponential blowup of execution time is nested quantifiers:

(a*)+ (a+)+ (a*)* (a+)* etc.
There are an exponential number of ways to split up a string of a's into inner and outer quantifier bits, and if the string to match has stuff past that string of a's that can't match, it will try every possibility in the effort to match. Hence the blowup. The solution is to rewrite your regexes to not have nested quantifiers.

Update: There does exist a directive for debugging regexes (see use re 'debug' in perlretut), but it won't make sense until you have a good understanding of how regexes work.

-Mark


Comment on Re: Analyzing regular expression performance
Download Code
Re^2: Analyzing regular expression performance
by kwaping (Priest) on May 12, 2006 at 16:20 UTC
    That would definitely explain at least some of it. The $sp and $words regexes posted above have that structure.

    ---
    It's all fine and dandy until someone has to look at the code.
      Changing the $sp regex to (?:\s*) cut runtime to almost nothing, so it looks like that's where my problem is.

      I do need to allow comments anywhere in the string, though -- including having comments right after comments. Is there a cleaner way of doing that?

      ---
      A fair fight is a sign of poor planning.

        I do think you are insane to parse SQL with regular expressions, but if you are brave and insist on this approach, I would suggest a multi-pass approach.

        Write one pattern that strips out comments. Write another pattern to coalesce blanks into a single blank. Then you will find that writing a third pattern to pick apart an expression becomes much easier. At least you won't need so many nested 0-or-many groupings -- that is what is killing your performance.

        I tried to run your code and it ate up all available memory... I do no think this approach will scale.

        • another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://549026]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2014-07-10 03:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (198 votes), past polls