Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hmm. Comparing the sol to nl (since they seem that they should behave very similarly) shows this in the re=debug dump:
nl version:
Compiling REx `\n \s* x ' size 7 Got 60 bytes for offset annotations. first at 1 rarest char x at 0 rarest char at 0 1: EXACT <\n>(3) 3: STAR(5) 4: SPACE(0) 5: EXACT <x>(7) 7: END(0)
And a typical match attempt looks like this:
Setting an EVAL scope, savestack=17 48 <y dog> < The > | 1: EXACT <\n> 49 < dog > < The q> | 3: STAR SPACE can match 2 times out of 2147483647 +... Setting an EVAL scope, savestack=17 failed...
Compare to the sol version:
Compiling REx `^ \s* x ' size 6 Got 52 bytes for offset annotations. first at 2 rarest char x at 0 synthetic stclass `ANYOF[\11\12\14\15 x{unicode_all}]'. 1: MBOL(2) 2: STAR(4) 3: SPACE(0) 4: EXACT <x>(6) 6: END(0)
Where a typical match attempt is more complex:
Guessing start of match, REx `^ \s* x ' against ` The quick brown fr +og jumped over the lazy dog The quick b...'... Found floating substr `x' at offset 48000... Found /^/m at offset 47... Does not contradict STCLASS... Guessed: match at offset 47 Setting an EVAL scope, savestack=17 49 < dog > < The q> | 1: MBOL 49 < dog > < The q> | 2: STAR SPACE can match 2 times out of 2147483647 +... Setting an EVAL scope, savestack=17 failed...

Unless I'm misreading it, it looks that the sol version has to rescan the string each time for a potential start point, and for the location of the 'x'. The nl version doesn't seem to need to redo all that work every round. My (wild) guess is that the ANYOF class that is implied by the '^/m' makes the engine think it cannot keep as much state between matches.

But this is really getting much deeper into the guts of perl than I'm really feeling comfortable guessing on.Maybe someone who's more familiar under the hood would be able to comment more?

As an aside, if you study $str before starting (takes a split second), the benchmark changes significantly, and seems much less insane:

Rate lb sol nl lb 162/s -- -87% -93% sol 1257/s 674% -- -45% nl 2272/s 1300% 81% --

-- zigdon

In reply to Re^4: Inexplicably slow regex (optimizer) by zigdon
in thread Inexplicably slow regex by Anonymous Monk

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (7)
    As of 2019-06-19 13:17 GMT
    Find Nodes?
      Voting Booth?
      Is there a future for codeless software?

      Results (88 votes). Check out past polls.

      • (Sep 10, 2018 at 22:53 UTC) Welcome new users!