|laziness, impatience, and hubris|
Re: tight loop regex optimizationby BrowserUk (Pope)
|on Nov 01, 2011 at 07:20 UTC||Need Help??|
Skip to the bottom first!
There are a few places in the code where small savings can be achieved essentially for free:
But even if you could make this level of savings on every single line in the program, you'd still save maybe an hour at most.
Looking at a few of the individual REs, nothing leaps off the page as being particularly extravagant. You should heed the annotation at the top of the profiling and try to remove usage of $&. This has been known to effect a substantial time saving.
The only place affected is this sub:
The uncommented replacements should have the same effect (untested) and the changes could have a substantial affect on the overall performance of a script dominated by regex manipulations.
While you're at it, you can also add a few micro-optimisations where they are called millions of times like:
which will save the 7 seconds spent copying the input parameter. But given that the overall runtime is 7 minutes, that's not going to have a big effect. The only way you're a likely to get substantial savings from within the script, is to try optimising the algorithms used -- which amounts to tuning all of the individual regexes; and the heuristics they represent -- and that comes with enormous risk of breaking the logic completely and would require extensive and detailed testing.
All of that said, if you split the workload across two processors, you're likely to achieve close to a 50% saving. Across 4, and a 75% saving is theoretically possible. It really doesn't make much sense to spend time looking for saving within the script when, with a little restructuring, it lends itself so readily to being parallelised.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.