|go ahead... be a heretic|
Re: Perl regexp matching is slow??by grinder (Bishop)
|on Jan 30, 2007 at 13:56 UTC||Need Help??|
The author is clearly an idiot :)
The Perl porters have long known about pathological expressions that will continue to run after the heat-death of the universe. This doesn't mean those patterns should be banished, but with a little care, and a little thought, Perl probably offers the informed programmer techniques for supplying hints to the regular expression engine to avoid such exponential blowups.
I also find it interesting that Perl allowed me to write a very small amount of code in order to explore the problem. And the solution here is to not use greedy matching.
Change 'aaa' =~ /a?a?a?aaa/ to /a??a??a??aaa/ and you have to push n out to about 100000 before you begin to notice any appreciable slowdown in the matching. Take the following for a spin:
Try running perl rematcher 20 and then perl rematcher 20000 ??. I get a better than three orders of magnitude improvement. Not bad for five minutes thought. Seeing how the Thomson graph climbs slowly but steadily from the beginning, I would not be surprised if Perl continued to outperform it all the way out until things begin to swap.
update: after finishing the article, I retract the statement that the author is an idiot. He does know his stuff, however, I still don't agree with the conclusion. He says that DFAs are better because you don't run into grief with patterns like /(.*)(.*)(.*)(.*)(.*)/ since it doesn't go exponential. This is just silly. If anything, it encourages sloppy programming.
Backreferences allow one to solve a much larger range of problems that DFAs cannot. That's why NFAs are in such widespread use. And if it takes a little education and experience to wield NFAs with assurance, then so be it.
• another intruder with the mooring in the heart of the Perl