MrNobo1024 has asked for the wisdom of the Perl Monks concerning the following question:
I ran this program, and entered the text '$&'. It printed out 'foo'. If you don't use $`, $&, or $' in your program at all, they aren't set on a regex match. There was no way for Perl to know I was going to enter $& into STDIN, so why did it set it? Does this mean that Perl is psychic?'foo' =~ m/.*/; print eval <STDIN>;
|
---|
Replies are listed 'Best First'. | |
---|---|
Re (tilly) 1: Perl is psychic?!
by tilly (Archbishop) on Mar 06, 2001 at 06:38 UTC | |
Since everybody else seems to have missed your (subtle) point by quoting irrelevant documentation that you clearly understood in great detail, allow me to repeat your point. Perl is supposed to have an important optimization. If you never use $&, $`, and $' in your script, Perl is not supposed to calculate them ever. This is important because it makes matches against long strings an order of magnitude faster. If you use them ever, they are calculated from then on. Caveat programmer. (I don't use them, ever. I wish I could make attempting to use them optionally fatal just to smoke out people who use them, but I can't.) With this optimization there should be no way that the above code will work since when you do the match, Perl is dealing with a script that has no $&, $`, or $' in it. And so when it goes to display the answer, the necessary data should not exist yet. But you run it and it does. For the record I ran it under 5.004, and got the output that you describe. I ran it under 5.005 and got no output at all as you would expect. I ran it under a slightly modified 5.6 and got a segmentation fault. (Not good, but in this case understandable.) A slight modification of your code to test $' and $` had similar results. With 5.005 when I look at perldelta I see that there were a number of changes to the RE engine including the following: The last 2 items sound like the behaviour fix. I guess that the optimization wasn't really being done in 5.004, or it was done but not done as fully as it was done later. For the record I was seriously impressed with Ruby's optimization for this case. What they did is lazily calculated $&, $', and $` as needed. You only pay on the matches where you use those, or on cases where you try to modify a string in place that you matched against before you go to match again. Don't use it one place, pay no price even if you use it elsewhere. I tried, but couldn't find a way to break it. I suspect that this approach (which is much cleaner) would be harder to do in Perl. Still it was a nice surprise...
UPDATE for my tests. As confirmed on several platforms in chatter, the behaviour switches between versions of Perl. But the original code snippet always seems to work, and I have not a clue how or why. | [reply] [d/l] [select] |
by pileswasp (Monk) on Mar 06, 2001 at 19:32 UTC | |
I've tested this on perl 5.004_04 for sun-solaris, perls 5.004_05 and 5.6 for i686-linux (redhat) and even ActiveState's 5.6.0 for Win32 and _all_ of them show the same behaviour. What causes the difference between two variations on this bit of code is whether or not the pattern is plain text (as it says above /blah/ may be optimized to an analogue of index()). If there's no regex compilation then $& causes Segmentation faults. Using use re 'debug'; shows that the regex isn't re-evaluated when the $& is entered on STDIN, but it does state explicitly Omitting $` $& $' support. Must say I'm at a bit of a loss as to where the value does come from. If I were to go out on a limb a bit I would say that I'm thinking that maybe the penalty from using $&, etc in your code is because perl links it into plain text matches as well as compiled regexes. ie $&, etc are always there for full compiled regex's, but index() doesn't normally return the pre-match, match and post-match strings, so the "analogue of index()" requires a bit more work to produce them. Where's japhy? I get the feeling he'll know :o) There's a bunch of tests and re 'debug' output below if you're interested: <READMORE> This gives the following output: Compiling REx `.*' size 3 first at 2 1: STAR(3) 2: REG_ANY(0) 3: END(0) anchored(MBOL) implicit minlen 0 Omitting $` $& $' support. EXECUTING... Matching REx `.*' against `foo' Setting an EVAL scope, savestack=3 0 <> <foo> | 1: STAR REG_ANY can match 3 times out of 32767... Setting an EVAL scope, savestack=3 3 <foo> <> | 3: END Match successful!Before waiting for the input. It actually specifies that it's omitting $&, etc support, yet when you do enter $& still gives the expected answer: Freeing REx: `.*' fooIf you use a plain text match (like tilly suggested with /ri/ in 'string', you don't get this result at all, as perl doesn't handle the match in the same way, it "guesses" the result, presumably using a more index() like way of making the match: gives the output: $ perl reg Compiling REx `o' size 3 first at 1 rarest char o at 0 1: EXACT <o>(3) 3: END(0) anchored `o' at 0 (checking anchored isall) minlen 1 Omitting $` $& $' support. EXECUTING... Guessing start of match, REx `o' against `foo'... Found anchored substr `o' at offset 1... Guessed: match at offset 1 $& Segmentation fault (core dumped)$` and $' don't have quite such drastic efects, they simply print blank. The extra level of compilation that look(ahead|behind)s give the regex also allow $& to produce the required result: Giving: $ perl reg Compiling REx `(?<=f)o(?=o)' size 15 first at 1 rarest char o at 0 1: IFMATCH[-1](7) 3: EXACT <f>(5) 5: SUCCEED(0) 6: TAIL(7) 7: EXACT <o>(9) 9: IFMATCH[-0](15) 11: EXACT <o>(13) 13: SUCCEED(0) 14: TAIL(15) 15: END(0) anchored `o' at 0 (checking anchored) minlen 1 Omitting $` $& $' support. EXECUTING... Guessing start of match, REx `(?<=f)o(?=o)' against `foo'... Found anchored substr `o' at offset 1... Guessed: match at offset 1 Matching REx `(?<=f)o(?=o)' against `oo' Setting an EVAL scope, savestack=3 1 <f> <oo> | 1: IFMATCH[-1] 0 <> <foo> | 3: EXACT <f> 1 <f> <oo> | 5: SUCCEED could match... 1 <f> <oo> | 7: EXACT <o> 2 <fo> <o> | 9: IFMATCH[-0] 2 <fo> <o> | 11: EXACT <o> 3 <foo> <> | 13: SUCCEED could match... 2 <fo> <o> | 15: END Match successful! $& Freeing REx: `(?<=f)o(?=o)' o | [reply] [d/l] [select] |
by boo_radley (Parson) on Mar 06, 2001 at 12:27 UTC | |
Does that sound at all plausible? If so, would that mean that evaling on $&, $' or $` would remove their associated penalties? | [reply] [d/l] [select] |
Re: Perl is psychic?!
by petral (Curate) on Mar 07, 2001 at 02:31 UTC | |
Seems as if there's a pointer squirrled away somewhere deep in perl that was never removed just because it was never accessed. When there's not supposed to be anything there (and isn't in the normal place), somehow this shows through. update: Could note, of course, that these needn't be considered bugs. Just because a program does something apparently semi-sensible for 'undefined behavior' doesn't mean one _has_ to beat on the poor thing till it stops. If this were any language but Perl one would expect the compiler/interpreter to simply throw up (or at least throw up its hands). In perl, it just gets tossed into the "Doctor it hurts when I do this. -- Then don't do that" bin. p | [reply] [d/l] |
(jptxs)Re: Perl is psychic?!
by jptxs (Curate) on Mar 06, 2001 at 05:06 UTC | |
| [reply] [d/l] [select] |
by MrNobo1024 (Hermit) on Mar 06, 2001 at 05:07 UTC | |
| [reply] |
by tilly (Archbishop) on Mar 06, 2001 at 06:50 UTC | |
| [reply] |
Re: Perl is psychic?!
by KM (Priest) on Mar 06, 2001 at 05:13 UTC | |
$& The string matched by the last successful pattern match (not counting any matches hidden within a BLOCK or eval() enclosed by the current BLOCK). (Mnemonic: like & in some editors.) This variable is read-only and dynamically scoped to the current BLOCK. So, since .* matched 'foo', $& is set when you use it. If you make your pattern /\d.*/ you will find you will get no output in your same test case.
(root@frodo):/tmp> # cat t.pl 'foo' =~ m/\d.*/; print eval <STDIN>; (root@frodo):/tmp> # perl t.pl $& (root@frodo):/tmp> #
Cheers, | [reply] |
Re: Perl is psychic?!
by mkmcconn (Chaplain) on Mar 07, 2001 at 01:31 UTC | |
And, I think this is amusing:
mkmcconn edited after first posting, to simplify examples | [reply] [d/l] [select] |