|Perl: the Markov chain saw|
Woo. This one's got me interested.
I've tested this on perl 5.004_04 for sun-solaris, perls 5.004_05 and 5.6 for i686-linux (redhat) and even ActiveState's 5.6.0 for Win32 and _all_ of them show the same behaviour.
What causes the difference between two variations on this bit of code is whether or not the pattern is plain text (as it says above /blah/ may be optimized to an analogue of index()). If there's no regex compilation then $& causes Segmentation faults.
use re 'debug';
shows that the regex isn't re-evaluated when the $& is entered on STDIN, but it does state explicitly Omitting $` $& $' support. Must say I'm at a bit of a loss as to where the value does come from.
If I were to go out on a limb a bit I would say that I'm thinking that maybe the penalty from using $&, etc in your code is because perl links it into plain text matches as well as compiled regexes. ie $&, etc are always there for full compiled regex's, but index() doesn't normally return the pre-match, match and post-match strings, so the "analogue of index()" requires a bit more work to produce them.
Where's japhy? I get the feeling he'll know :o)
There's a bunch of tests and re 'debug' output below if you're interested: <READMORE>
This gives the following output:
Compiling REx `.*' size 3 first at 2 1: STAR(3) 2: REG_ANY(0) 3: END(0) anchored(MBOL) implicit minlen 0 Omitting $` $& $' support. EXECUTING... Matching REx `.*' against `foo' Setting an EVAL scope, savestack=3 0 <> <foo> | 1: STAR REG_ANY can match 3 times out of 32767... Setting an EVAL scope, savestack=3 3 <foo> <> | 3: END Match successful!Before waiting for the input. It actually specifies that it's omitting $&, etc support, yet when you do enter $& still gives the expected answer:
Freeing REx: `.*' fooIf you use a plain text match (like tilly suggested with /ri/ in 'string', you don't get this result at all, as perl doesn't handle the match in the same way, it "guesses" the result, presumably using a more index() like way of making the match:
gives the output:
$ perl reg Compiling REx `o' size 3 first at 1 rarest char o at 0 1: EXACT <o>(3) 3: END(0) anchored `o' at 0 (checking anchored isall) minlen 1 Omitting $` $& $' support. EXECUTING... Guessing start of match, REx `o' against `foo'... Found anchored substr `o' at offset 1... Guessed: match at offset 1 $& Segmentation fault (core dumped)$` and $' don't have quite such drastic efects, they simply print blank.
The extra level of compilation that look(ahead|behind)s give the regex also allow $& to produce the required result:
$ perl reg Compiling REx `(?<=f)o(?=o)' size 15 first at 1 rarest char o at 0 1: IFMATCH[-1](7) 3: EXACT <f>(5) 5: SUCCEED(0) 6: TAIL(7) 7: EXACT <o>(9) 9: IFMATCH[-0](15) 11: EXACT <o>(13) 13: SUCCEED(0) 14: TAIL(15) 15: END(0) anchored `o' at 0 (checking anchored) minlen 1 Omitting $` $& $' support. EXECUTING... Guessing start of match, REx `(?<=f)o(?=o)' against `foo'... Found anchored substr `o' at offset 1... Guessed: match at offset 1 Matching REx `(?<=f)o(?=o)' against `oo' Setting an EVAL scope, savestack=3 1 <f> <oo> | 1: IFMATCH[-1] 0 <> <foo> | 3: EXACT <f> 1 <f> <oo> | 5: SUCCEED could match... 1 <f> <oo> | 7: EXACT <o> 2 <fo> <o> | 9: IFMATCH[-0] 2 <fo> <o> | 11: EXACT <o> 3 <foo> <> | 13: SUCCEED could match... 2 <fo> <o> | 15: END Match successful! $& Freeing REx: `(?<=f)o(?=o)' o