http://www.perlmonks.org?node_id=1044203


in reply to Re: Simple regex question. Grouping with a negative lookahead assertion.
in thread Simple regex question. Grouping with a negative lookahead assertion.

Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: Simple regex question. Grouping with a negative lookahead assertion.

Replies are listed 'Best First'.
Re^3: Simple regex question. Grouping with a negative lookahead assertion.
by AnomalousMonk (Archbishop) on Jul 14, 2013 at 22:47 UTC
    Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.

    Apparently it was so bad, you tried to -- it three times!

    I was curious about the locus of crockitudinousness and decided to do some benchmarking, usually at the root of these squabbles. (Update: Benchmarked variations include some of those used by kcott here.) I must admit I was shocked, shocked by the results. There were no big surprises until I looked at the effect of the  //p regex modifier. Simply adding this modifier to
        m{ atg ([acgt]+?) (?= taa|tag|tga) }xmsg
    in the  push @ra, $1 variation ($push_cg below, which otherwise performs roughly comparably to the other variations) slows its performance by orders of magnitude, so much so that I didn't have the patience to run the benchmark to completion.

    Am I doing this right? (Update: I.e., is the effect of the use of  //p as in the  $push_KM sub below, which I don't even have the patience to benchmark, really so egregious?) Is this all down to the  //p modifier? And if so, have the proper authorities been notified? If you've touched on this in other threads, I have not been following these discussions as carefully as I ought. Anyway, here's my benchmark code. As always, I would be interested in any comments you might have.

      "Benchmarked variations include some of those used by kcott"

      I'm assuming you're referring to cg_ncg with (?: ... ) and cg_atomic with (?> ... ).

      Prior to posting yesterday, and purely out of curiousity, I ran /atg(.+?)(?:taa|tag|tga)/ and /atg(.+?)(?>taa|tag|tga)/ through Regexp::Debugger looking at the matching process step-by-step. From memory, ?: took 64 steps (in total) to complete the match while ?> took 66 steps. That probably accounts for the cg_atomic vs. cg_ncg 3% (66/64 = 1.03125).

      Again from memory, the two extra steps occurred after failing to match taa|tag|tga after either the 'a' or 't' of 'atg'. For the ?: case, the steps were something like: "(?:" start non-capture group; "taa" no match; "|" next alt; ...; "tga" no match. For the ?> case: "(?>" start non-backtracking group; ... as for ?: ...; (then the additional step) ")" end non-backtracking group.

      Obviously, you can check that yourself if you're so inclined. I wasn't inclined to repeat the process. :-)

      [I haven't analysed your benchmarking further.]

      -- Ken