http://www.perlmonks.org?node_id=669232


in reply to 5.10.0 regex slowdown

I suspect that what is happening is that perl is building multiple trie objects and then executing each in turn, because the sizeof the jump offset is insufficient to jump further than 128 k forward. It highly likely that i didnt spend enough time on handling this case and some further work would improve it. I havent actually run the code to inspect the output of re=debug and i suspect its too long to post. Ill try to get it run sometime soon.

Regarding RE_MAXBUF, what that does is set a threshold over which perl will switch to a slower and less efficient construction algorithm. If its set to 0 it disables the trie optimisation outright, but it has to be set before the pattern is compiled, in a BEGIN for instance.

---
$world=~s/war/peace/g

Replies are listed 'Best First'.
Re^2: 5.10.0 regex slowdown
by BrowserUk (Patriarch) on Feb 21, 2008 at 11:20 UTC

      It would neat to see the re=debug dump of your pattern, piped into grep TRIE to see whats going on. I suspect that you can do it easiest with perl -c on code with a use re 'debug'.

      perl -c -Mre=debug SCRIPT 2>&1 | grep TRIE

      Should do it.

      Actually if you could grep for JUMP and JMP too it would be good.

      ---
      $world=~s/war/peace/g

        With -c -Mre=debug, I get no output other than 668954.p10 syntax OK, but as the regex is built at runtime, I wouldn't expect to?

        Running the code on the non-pathological case with use re 'debug'; I get 2 million lines of log of which 11,282 contain 'TRIE'; and the are no 'JUMP's or 'JMP's.

        The pathological case is running (and has already reached 10 million+ lines), but it looks like it is going to take quite a while. Can you /msg me your email and we can take this off line.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        greetings demerphq,

        I've run both cases with -D512, see Re^3: 5.10.0 regex slowdown. Do you want more of that? ;-)

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}