Re^3: Inexplicably slow regex (optimizer)

in reply to Re^2: Inexplicably slow regex
in thread Inexplicably slow regex

I think the lookback is what is killing you here.

But the question is why is the "sol" version so much slower than even the "lb" version.

I suspect it all boils down to optimization. All three cases could, in theory, anchor to "\n" characters but, in practice, the optimizer may not be smart enough to realize this.

My interpretation (aka "wild guess") of GrandFather's numbers:

       Rate     sol      lb      nl
sol 0.316/s      --    -99%   -100%
lb   35.0/s  10997%      --    -92%
nl    441/s 139470%   1158%      --
[download]

is that the "lb" regex is a bit more complex and so runs a bit slower while the "sol" regex runs so much slower that I'd expect it to be the one which is hitting way too many possible starting points rather than jumping to key spots such as "\n" (even more speed from Boyer-Moore probably doesn't apply here since I don't think any of these regexes are simple enough).

Yes, I've been hoping someone would use -Mre=debug and summarize what it reported on a system that saw the "sol" regex being especially slow. I think that is the most likely route at explaining the "problem". Then it would be interesting to compare that against what it reports for systems that don't see "sol" being so slow.

- tye

Comment on Re^3: Inexplicably slow regex (optimizer) Download Code

Replies are listed 'Best First'.
Re^4: Inexplicably slow regex (optimizer) by zigdon (Deacon) on Sep 13, 2006 at 19:36 UTC
Hmm. Comparing the sol to nl (since they seem that they should behave very similarly) shows this in the `re=debug` dump: Read more... (2 kB) Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches. But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more? As an aside, if you `study $str` before starting (takes a split second), the benchmark changes significantly, and seems much less insane: `Rate lb sol nl lb 162/s -- -87% -93% sol 1257/s 674% -- -45% nl 2272/s 1300% 81% --` [download] -- zigdon	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^4: Inexplicably slow regex (optimizer)
by zigdon (Deacon) on Sep 13, 2006 at 19:36 UTC

re=debug

Read more... (2 kB)

Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches.

But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more?

As an aside, if you study $str before starting (takes a split second), the benchmark changes significantly, and seems much less insane:

      Rate    lb   sol    nl
lb   162/s    --  -87%  -93%
sol 1257/s  674%    --  -45%
nl  2272/s 1300%   81%    --
[download]

-- zigdon

[reply]
[d/l]
[select]

In Section Seekers of Perl Wisdom