Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^3: Inexplicably slow regex (optimizer)

by tye (Sage)
on Sep 13, 2006 at 18:10 UTC ( #572792=note: print w/replies, xml ) Need Help??

in reply to Re^2: Inexplicably slow regex
in thread Inexplicably slow regex

I think the lookback is what is killing you here.

But the question is why is the "sol" version so much slower than even the "lb" version.

I suspect it all boils down to optimization. All three cases could, in theory, anchor to "\n" characters but, in practice, the optimizer may not be smart enough to realize this.

My interpretation (aka "wild guess") of GrandFather's numbers:

Rate sol lb nl sol 0.316/s -- -99% -100% lb 35.0/s 10997% -- -92% nl 441/s 139470% 1158% --

is that the "lb" regex is a bit more complex and so runs a bit slower while the "sol" regex runs so much slower that I'd expect it to be the one which is hitting way too many possible starting points rather than jumping to key spots such as "\n" (even more speed from Boyer-Moore probably doesn't apply here since I don't think any of these regexes are simple enough).

Yes, I've been hoping someone would use -Mre=debug and summarize what it reported on a system that saw the "sol" regex being especially slow. I think that is the most likely route at explaining the "problem". Then it would be interesting to compare that against what it reports for systems that don't see "sol" being so slow.

- tye        

Replies are listed 'Best First'.
Re^4: Inexplicably slow regex (optimizer)
by zigdon (Deacon) on Sep 13, 2006 at 19:36 UTC
    Hmm. Comparing the sol to nl (since they seem that they should behave very similarly) shows this in the re=debug dump:

    Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches.

    But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more?

    As an aside, if you study $str before starting (takes a split second), the benchmark changes significantly, and seems much less insane:

    Rate lb sol nl lb 162/s -- -87% -93% sol 1257/s 674% -- -45% nl 2272/s 1300% 81% --

    -- zigdon

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://572792]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2020-08-13 11:27 GMT
Find Nodes?
    Voting Booth?
    Which rocket would you take to Mars?

    Results (70 votes). Check out past polls.