Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Inexplicably slow regex (optimizer)

by zigdon (Deacon)
on Sep 13, 2006 at 19:36 UTC ( [id://572804]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Inexplicably slow regex (optimizer)
in thread Inexplicably slow regex

Hmm. Comparing the sol to nl (since they seem that they should behave very similarly) shows this in the re=debug dump:
nl version:
Compiling REx `\n \s* x ' size 7 Got 60 bytes for offset annotations. first at 1 rarest char x at 0 rarest char at 0 1: EXACT <\n>(3) 3: STAR(5) 4: SPACE(0) 5: EXACT <x>(7) 7: END(0)
And a typical match attempt looks like this:
Setting an EVAL scope, savestack=17 48 <y dog> < The > | 1: EXACT <\n> 49 < dog > < The q> | 3: STAR SPACE can match 2 times out of 2147483647 +... Setting an EVAL scope, savestack=17 failed...
Compare to the sol version:
Compiling REx `^ \s* x ' size 6 Got 52 bytes for offset annotations. first at 2 rarest char x at 0 synthetic stclass `ANYOF[\11\12\14\15 x{unicode_all}]'. 1: MBOL(2) 2: STAR(4) 3: SPACE(0) 4: EXACT <x>(6) 6: END(0)
Where a typical match attempt is more complex:
Guessing start of match, REx `^ \s* x ' against ` The quick brown fr +og jumped over the lazy dog The quick b...'... Found floating substr `x' at offset 48000... Found /^/m at offset 47... Does not contradict STCLASS... Guessed: match at offset 47 Setting an EVAL scope, savestack=17 49 < dog > < The q> | 1: MBOL 49 < dog > < The q> | 2: STAR SPACE can match 2 times out of 2147483647 +... Setting an EVAL scope, savestack=17 failed...

Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches.

But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more?

As an aside, if you study $str before starting (takes a split second), the benchmark changes significantly, and seems much less insane:

Rate lb sol nl lb 162/s -- -87% -93% sol 1257/s 674% -- -45% nl 2272/s 1300% 81% --

-- zigdon

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://572804]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-04-23 08:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found