Re^4: Inexplicably slow regex (optimizer)

Hmm. Comparing the sol to nl (since they seem that they should behave very similarly) shows this in the re=debug dump:

nl version:

  Compiling REx `\n \s* x '
  size 7 Got 60 bytes for offset annotations.
  first at 1
  rarest char x at 0
  rarest char
   at 0
     1: EXACT <\n>(3)
     3: STAR(5)
     4:   SPACE(0)
     5: EXACT <x>(7)
     7: END(0)
[download]

And a typical match attempt looks like this:

    Setting an EVAL scope, savestack=17
    48 <y dog> <
    The >    |  1:  EXACT <\n>
    49 < dog
  > <  The q>    |  3:  STAR
                             SPACE can match 2 times out of 2147483647
+...
    Setting an EVAL scope, savestack=17
                              failed...
[download]

Compare to the sol version:

  Compiling REx `^ \s* x '
  size 6 Got 52 bytes for offset annotations.
  first at 2
  rarest char x at 0
  synthetic stclass `ANYOF[\11\12\14\15 x{unicode_all}]'.
     1: MBOL(2)
     2: STAR(4)
     3:   SPACE(0)
     4: EXACT <x>(6)
     6: END(0)
[download]

Where a typical match attempt is more complex:

  Guessing start of match, REx `^ \s* x ' against ` The quick brown fr
+og jumped over the lazy dog
    The quick b...'...
  Found floating substr `x' at offset 48000...
  Found /^/m at offset 47...
  Does not contradict STCLASS...
  Guessed: match at offset 47
    Setting an EVAL scope, savestack=17
    49 < dog
  > <  The q>    |  1:  MBOL
    49 < dog
  > <  The q>    |  2:  STAR
                             SPACE can match 2 times out of 2147483647
+...
    Setting an EVAL scope, savestack=17
                              failed...
[download]

Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches.

But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more?

As an aside, if you study $str before starting (takes a split second), the benchmark changes significantly, and seems much less insane:

      Rate    lb   sol    nl
lb   162/s    --  -87%  -93%
sol 1257/s  674%    --  -45%
nl  2272/s 1300%   81%    --
[download]

-- zigdon

Comment on Re^4: Inexplicably slow regex (optimizer) Select or Download Code


more useful options
	PerlMonks