Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.
Apparently it was so bad, you tried to -- it three times!
I was curious about the locus of crockitudinousness and decided to do some benchmarking, usually at the root of these squabbles. (Update: Benchmarked variations include some of those used by kcott here.) I must admit I was shocked, shocked by the results. There were no big surprises until I looked at the effect of the //p regex modifier. Simply adding this modifier to
m{ atg ([acgt]+?) (?= taa|tag|tga) }xmsg
in the push @ra, $1 variation ($push_cg below, which otherwise performs roughly comparably to the other variations) slows its performance by orders of magnitude, so much so that I didn't have the patience to run the benchmark to completion.
Am I doing this right? (Update: I.e., is the effect of the use of //p as in the $push_KM sub below, which I don't even have the patience to benchmark, really so egregious?) Is this all down to the //p modifier? And if so, have the proper authorities been notified? If you've touched on this in other threads, I have not been following these discussions as carefully as I ought. Anyway, here's my benchmark code. As always, I would be interested in any comments you might have.
| [reply] [d/l] [select] |
"Benchmarked variations include some of those used by kcott"
I'm assuming you're referring to cg_ncg with (?: ... ) and cg_atomic with (?> ... ).
Prior to posting yesterday, and purely out of curiousity, I ran /atg(.+?)(?:taa|tag|tga)/ and /atg(.+?)(?>taa|tag|tga)/ through Regexp::Debugger looking at the matching process step-by-step.
From memory, ?: took 64 steps (in total) to complete the match while ?> took 66 steps.
That probably accounts for the cg_atomic vs. cg_ncg 3% (66/64 = 1.03125).
Again from memory, the two extra steps occurred after failing to match taa|tag|tga after either the 'a' or 't' of 'atg'. For the ?: case, the steps were something like: "(?:" start non-capture group; "taa" no match; "|" next alt; ...; "tga" no match.
For the ?> case: "(?>" start non-backtracking group; ... as for ?: ...; (then the additional step) ")" end non-backtracking group.
Obviously, you can check that yourself if you're so inclined.
I wasn't inclined to repeat the process. :-)
[I haven't analysed your benchmarking further.]
| [reply] [d/l] [select] |