MrNobo1024 has asked for the wisdom of the Perl Monks concerning the following question:
'foo' =~ m/.*/;
print eval <STDIN>;
I ran this program, and entered the text '$&'. It printed out 'foo'. If you don't use $`, $&, or $' in your program at all, they aren't set on a regex match. There was no way for Perl to know I was going to enter $& into STDIN, so why did it set it? Does this mean that Perl is psychic?
Re (tilly) 1: Perl is psychic?!
by tilly (Archbishop) on Mar 06, 2001 at 06:38 UTC
|
Excellent question!
Since everybody else seems to have missed your (subtle)
point by quoting irrelevant documentation that you clearly
understood in great detail, allow me to repeat your point.
Perl is supposed to have an important optimization. If
you never use $&, $`, and $' in your script, Perl is
not supposed to calculate them ever. This is important
because it makes matches against long strings an order of
magnitude faster. If you use them ever, they are
calculated from then on. Caveat programmer. (I don't use
them, ever. I wish I could make attempting to use them
optionally fatal just to smoke out people who use them,
but I can't.)
With this optimization there should be no way that the above
code will work since when you do the match, Perl is dealing
with a script that has no $&, $`, or $' in it. And so
when it goes to display the answer, the necessary data
should not exist yet. But you run it and it does.
For the record I ran it under 5.004, and got the output that
you describe. I ran it under 5.005 and got no output at
all as you would expect. I ran it under a slightly modified
5.6 and got a segmentation fault. (Not good, but in this
case understandable.) A slight modification of your code
to test $' and $` had similar results. With 5.005 when I
look at perldelta I see that there were a number of changes
to the RE engine including the following:
Changes in Perl code using RE engine:
More optimizations to s/longer/short/;
study() was not working;
/blah/ may be optimized to an analogue of index() i
+f $& $` $'
not seen;
Unneeded copying of matched-against string removed;
Only matched part of the string is copying if $` $'
+ were not
seen;
The last 2 items sound like the behaviour fix. I guess
that the optimization wasn't really being done in 5.004,
or it was done but not done as fully as it was done later.
For the record I was seriously impressed with Ruby's
optimization for this case. What they did is lazily
calculated $&, $', and $` as needed. You only pay on
the matches where you use those, or on cases where you try
to modify a string in place that you matched against before
you go to match again. Don't use it one place, pay no
price even if you use it elsewhere. I tried, but
couldn't find a way
to break it. I suspect that this approach (which is much
cleaner) would be harder to do in Perl. Still it was
a nice surprise...
UPDATE
This seems to be very, very specific to the code. I
actually assumed I knew what should happen and wanted to
check $` and $' as well, so I changed the code to
'string' =~ /ri/;
print eval <STDIN>;
for my tests. As confirmed on several platforms in chatter,
the behaviour switches between versions of Perl. But the
original code snippet always seems to work, and I have not
a clue how or why. | [reply] [d/l] [select] |
|
Woo. This one's got me interested.
I've tested this on perl 5.004_04 for sun-solaris, perls 5.004_05 and 5.6 for i686-linux (redhat) and even ActiveState's 5.6.0 for Win32 and _all_ of them show the same behaviour.
What causes the difference between two variations on this bit of code is whether or not the pattern is plain text (as it says above /blah/ may be optimized to an analogue of index()). If there's no regex compilation then $& causes Segmentation faults.
Using use re 'debug'; shows that the regex isn't re-evaluated when the $& is entered on STDIN, but it does state explicitly Omitting $` $& $' support. Must say I'm at a bit of a loss as to where the value does come from.
If I were to go out on a limb a bit I would say that I'm thinking that maybe the penalty from using $&, etc in your
code is because perl links it into plain text matches as well as compiled regexes. ie $&, etc are always
there for full compiled regex's, but index() doesn't normally return the pre-match, match and post-match strings, so the "analogue of index()" requires a bit more work to produce them.
Where's japhy? I get the feeling he'll know :o)
There's a bunch of tests and re 'debug' output below if you're interested:
<READMORE>
use re 'debug';
'foo' =~ m/.*/;
print eval <STDIN>;
This gives the following output:
Compiling REx `.*'
size 3 first at 2
1: STAR(3)
2: REG_ANY(0)
3: END(0)
anchored(MBOL) implicit minlen 0
Omitting $` $& $' support.
EXECUTING...
Matching REx `.*' against `foo'
Setting an EVAL scope, savestack=3
0 <> <foo> | 1: STAR
REG_ANY can match 3 times out of 32767...
Setting an EVAL scope, savestack=3
3 <foo> <> | 3: END
Match successful!
Before waiting for the input. It actually specifies that it's omitting $&, etc support, yet when you do enter $& still gives the expected answer:
Freeing REx: `.*'
foo
If you use a plain text match (like tilly suggested with /ri/ in 'string', you don't get this result at all, as perl
doesn't handle the match in the same way, it "guesses" the result, presumably using a more index() like way of making the match:
use re 'debug';
'foo' =~ m/o/;
print eval <STDIN>;
gives the output:
$ perl reg
Compiling REx `o'
size 3 first at 1
rarest char o at 0
1: EXACT <o>(3)
3: END(0)
anchored `o' at 0 (checking anchored isall) minlen 1
Omitting $` $& $' support.
EXECUTING...
Guessing start of match, REx `o' against `foo'...
Found anchored substr `o' at offset 1...
Guessed: match at offset 1
$&
Segmentation fault (core dumped)
$` and $' don't have quite such drastic efects, they simply print blank.
The extra level of compilation that look(ahead|behind)s give the regex also
allow $& to produce the required result:
use re 'debug';
'foo' =~ m/(?<=f)o(?=o)/;
print eval <STDIN>;
Giving:
$ perl reg
Compiling REx `(?<=f)o(?=o)'
size 15 first at 1
rarest char o at 0
1: IFMATCH[-1](7)
3: EXACT <f>(5)
5: SUCCEED(0)
6: TAIL(7)
7: EXACT <o>(9)
9: IFMATCH[-0](15)
11: EXACT <o>(13)
13: SUCCEED(0)
14: TAIL(15)
15: END(0)
anchored `o' at 0 (checking anchored) minlen 1
Omitting $` $& $' support.
EXECUTING...
Guessing start of match, REx `(?<=f)o(?=o)' against `foo'...
Found anchored substr `o' at offset 1...
Guessed: match at offset 1
Matching REx `(?<=f)o(?=o)' against `oo'
Setting an EVAL scope, savestack=3
1 <f> <oo> | 1: IFMATCH[-1]
0 <> <foo> | 3: EXACT <f>
1 <f> <oo> | 5: SUCCEED
could match...
1 <f> <oo> | 7: EXACT <o>
2 <fo> <o> | 9: IFMATCH[-0]
2 <fo> <o> | 11: EXACT <o>
3 <foo> <> | 13: SUCCEED
could match...
2 <fo> <o> | 15: END
Match successful!
$&
Freeing REx: `(?<=f)o(?=o)'
o
| [reply] [d/l] [select] |
|
I'm curious to know if perl would attempt to re-execute the last regexp inside the eval block to get $&?
Does that sound at all plausible?
If so, would that mean that evaling on $&, $' or $` would remove their associated penalties?
| [reply] [d/l] [select] |
Re: Perl is psychic?!
by petral (Curate) on Mar 07, 2001 at 02:31 UTC
|
This is something like another bug someone posted recently (I guess in the cb since I can't find it anywhere). Combining them just for fun:
> perl -lwe '() = ($_ = "abc") =~ /(c)/; $_ = "def"; print eval <>'
$&
f
>
Seems as if there's a pointer squirrled away somewhere deep in perl that was never removed just because it was never accessed. When there's not supposed to be anything there (and isn't in the normal place), somehow this shows through.
update: Could note, of course, that these needn't be considered bugs. Just because a program does something apparently semi-sensible for 'undefined behavior' doesn't mean one _has_ to beat on the poor thing till it stops. If this were any language but Perl one would expect the compiler/interpreter to simply throw up (or at least throw up its hands). In perl, it just gets tossed into the "Doctor it hurts when I do this. -- Then don't do that" bin.
p | [reply] [d/l] |
(jptxs)Re: Perl is psychic?!
by jptxs (Curate) on Mar 06, 2001 at 05:06 UTC
|
according to the Perl5 Pocket Ref, $& is the string matched by the last successful pattern match. Since your regex .* matches anything $& is set to that by the first line, which matches foo and that's what you print in your eval - it eval's $& and finds 'foo' there. I think... =)
"A man's maturity -- consists in having found again the
seriousness one had as a child, at play." --Nietzsche
| [reply] [d/l] [select] |
|
Yes, but Perl dosen't set $& if you don't use it, and it was impossible to know that it would be used...
| [reply] |
|
Whoever voted down the above node completely missed the
point. MrNobo1024 is completely correct in saying that
if Perl worked as documented as far back as, say, Camel 2
then it should not have had enough information to
calculate $&.
| [reply] |
Re: Perl is psychic?!
by KM (Priest) on Mar 06, 2001 at 05:13 UTC
|
Yes, Perl is psychic... but not in this case. If you look at perlvar you will see:
$& The string matched by the last successful pattern
match (not counting any matches hidden within a
BLOCK or eval() enclosed by the current BLOCK).
(Mnemonic: like & in some editors.) This variable
is read-only and dynamically scoped to the current
BLOCK.
So, since .* matched 'foo', $& is set when you use it. If you make your pattern /\d.*/ you will find you will get no output in your same test case.
(root@frodo):/tmp>
# cat t.pl
'foo' =~ m/\d.*/;
print eval <STDIN>;
(root@frodo):/tmp>
# perl t.pl
$&
(root@frodo):/tmp>
#
Cheers,
KM | [reply] |
Re: Perl is psychic?!
by mkmcconn (Chaplain) on Mar 07, 2001 at 01:31 UTC
|
This introduces several new ideas to me, so I played with
it for more than a quarter hour, at a console command-line. I tried in 5.005_003
and in 5.6, evaluating a second eval(), getting the same
behavior as for the first eval(). I guess it clarifies the behavior, and
hopefully it's contributory to an interesting thread. > perl -wle '
q(foo) =~ m/.*/;
eval <>;'
print $&; q(snarf) =~ m/.*/ ; eval <>;
#prints 'foo', not 'snarf' and waits for input;
And, I think this is amusing:
> perl -le ' my $incr = 0;
q( print $incr++, $& and " stew" =~ /.*/ and eval $& until $incr > 10)
+ =~ /.*/;
eval <>;'
eval $&; # prints '0 ( guesswhat) '..'10 stew' (versions differ on
+-w)
mkmcconn edited after first posting, to simplify examples | [reply] [d/l] [select] |
|
|