Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Simple regex question. Grouping with a negative lookahead assertion.

by BrowserUk (Pope)
on Jul 14, 2013 at 06:42 UTC ( #1044203=note: print w/ replies, xml ) Need Help??


in reply to Re: Simple regex question. Grouping with a negative lookahead assertion.
in thread Simple regex question. Grouping with a negative lookahead assertion.

Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Simple regex question. Grouping with a negative lookahead assertion.
Re^3: Simple regex question. Grouping with a negative lookahead assertion.
by AnomalousMonk (Monsignor) on Jul 14, 2013 at 22:47 UTC
    Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.

    Apparently it was so bad, you tried to -- it three times!

    I was curious about the locus of crockitudinousness and decided to do some benchmarking, usually at the root of these squabbles. (Update: Benchmarked variations include some of those used by kcott here.) I must admit I was shocked, shocked by the results. There were no big surprises until I looked at the effect of the  //p regex modifier. Simply adding this modifier to
        m{ atg ([acgt]+?) (?= taa|tag|tga) }xmsg
    in the  push @ra, $1 variation ($push_cg below, which otherwise performs roughly comparably to the other variations) slows its performance by orders of magnitude, so much so that I didn't have the patience to run the benchmark to completion.

    Am I doing this right? (Update: I.e., is the effect of the use of  //p as in the  $push_KM sub below, which I don't even have the patience to benchmark, really so egregious?) Is this all down to the  //p modifier? And if so, have the proper authorities been notified? If you've touched on this in other threads, I have not been following these discussions as carefully as I ought. Anyway, here's my benchmark code. As always, I would be interested in any comments you might have.

      "Benchmarked variations include some of those used by kcott"

      I'm assuming you're referring to cg_ncg with (?: ... ) and cg_atomic with (?> ... ).

      Prior to posting yesterday, and purely out of curiousity, I ran /atg(.+?)(?:taa|tag|tga)/ and /atg(.+?)(?>taa|tag|tga)/ through Regexp::Debugger looking at the matching process step-by-step. From memory, ?: took 64 steps (in total) to complete the match while ?> took 66 steps. That probably accounts for the cg_atomic vs. cg_ncg 3% (66/64 = 1.03125).

      Again from memory, the two extra steps occurred after failing to match taa|tag|tga after either the 'a' or 't' of 'atg'. For the ?: case, the steps were something like: "(?:" start non-capture group; "taa" no match; "|" next alt; ...; "tga" no match. For the ?> case: "(?>" start non-backtracking group; ... as for ?: ...; (then the additional step) ")" end non-backtracking group.

      Obviously, you can check that yourself if you're so inclined. I wasn't inclined to repeat the process. :-)

      [I haven't analysed your benchmarking further.]

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044203]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2014-08-23 10:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (173 votes), past polls