Hofmator has asked for the wisdom of the Perl Monks concerning the following question:
Playing around with regexes (abusing them :) on the weekend I came across the
following (on Perl 5.6.1, ActiveState Build 626). Executing the same regex on
the same string in a loop multiple times yields different results
for the first run and the remaining runs. I'm running on Win2K, but I don't think this plays a role here. The code:
#!/usr/bin/perl
use strict;
use warnings;
use re qw/eval /;
my $pattern = q/(.)(?{
print ++$counts[0];
})^/;
my $line = 'ab';
for (0..2) {
my @counts = (0);
print "$_: ";
# $pattern .= '(?=.)';
$line =~ /$pattern/;
print "; \@counts = (", join(', ', @counts), ")\n";
}
print "\@main::counts = (", join(', ', @::counts), ")\n";
This prints - apart from the warning about the last line:
0: 12; @counts = (2)
1: 34; @counts = (0)
2: 56; @counts = (0)
@main::counts = ()
which means, it works the first time as expected but the next times my @counts
doesn't get modified by the regex. However, inside the regex the variable
seems to retain its value from execution to execution.
When using a package variable by changing my @counts to
our @counts the program works as expected and prints:
0: 12; @counts = (2)
1: 12; @counts = (2)
2: 12; @counts = (2)
@main::counts = (2)
When uncommenting the $pattern .= line (and going back to
my) - effectively changing the pattern in every loop (remark:
this does not effect the working of the regex!), the code also works as
expected printing:
0: 12; @counts = (2)
1: 12; @counts = (2)
2: 12; @counts = (2)
@main::counts = ()
My question - is this a known bug? Is it a bug at all or might I have
overlooked a (well) documented feature ;-) and how does this behave in other versions of perl?
-- Hofmator
Re: Perl Bug in Regex Code Block?
by japhy (Canon) on Sep 03, 2001 at 17:22 UTC
|
Your regex is only being compiled once, and in this compilation, it makes note of the variable you're using. Thus, it creates an "accidental" closure. Here is my proof:
### update: fixed
### thanks Hof -- I condensed working code poorly :(
use re 'eval';
my @r;
my $p = q/.(?{ ++$x[0] })^/;
for (0..2) {
my @x = (0);
"ab" =~ $p;
push @r, \@x;
}
print "$_->[0]" for @r;
That code prints 600. If, however, you cause the regex to change, such that it requires recompilation, the binding to the previous @x is gone, and the new @x is bound.
If you were to use qr// instead, you'd be changing the global array.
You're doing some funny-looking scope-crufting. I'd stay away from it if I were you. This situation is the sort of thing I fear having to write about and explain in my book.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] |
|
| [reply] [d/l] |
|
"12180" =~ m{
(?{ local @n = () })
(?: (\d) (?{ local @n = (@n, $1) }) )+
\d
(?{ @d = @n })
}x;
We make a local array that things happen to, and then we copy it to our real array at the end of the regex. In your case, you might want to do:
local @n;
/(.)(?{ ++$n[0] })^/;
@d = @n;
First thing last: regex compilation is an interesting thing. Here is code that compiles the regex twice:
$p = '\w+-\d+';
/$p/;
/$p/;
And here's code that only compiles it once:
$p = '\w+-\d+';
for $i (1,2) { /$p/ }
The secret is this (and pertains to regexes with variables in them, for they're not compiled until run-time): for each compilation op-code in the syntax tree, Perl keeps a string representation of the regex. The next time the compilation op-code is gotten to, the NEW string representation is compared with the previous one. If they are the same, the regex doesn't need recompilation. If they are different, it does need to be recompiled.
Now, if you've heard "if you have a regex, and it has variables in it, and the variables change, the regex has to be recompiled" that's technically incorrect:
($x,$y) = ('a+', 'b');
for (1,2) {
/$x$y/;
($x,$y) = ('a', '+b');
}
The two variables comprising our regex have changed, but the regex ends up being the same. Sneaky, eh?
I can't take credit for figuring this out on my own -- a couple months ago, Dominus gave me the hint about the string representation. Now I understand.
So that answers your question, I think.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] [select] |
Re: Perl Bug in Regex Code Block?
by stefan k (Curate) on Sep 03, 2001 at 16:53 UTC
|
Hi,
not that I fully understand what you're doing there, but it seems to me that you're fiddling with scopes in a very unintuitive way (at least to me). What is the scope of counts whenever you're referring to it? Within the regexp it should be the global one, shouldn't it? It is first used outside the for-loop; but OTOH it isn't declared using my, so how could it pass use strict??
Well then, running the same code under 5.6.0/Linux results in:
Name "main::counts" used only once: possible typo at ./re-code.pl line
+ 20.
0: 12; @counts = (0)
1: 34; @counts = (0)
2: 56; @counts = (0)
@main::counts = (6)
Then uncommenting the pattern line (your third example) yields exactly the same results. Changing my to our I get the same result as you get.
You're simply throwing away the warning we get in line 20. Is this a clever thing to do?
blblblblblblblblblblb
You know what? I'm even more confused than before I started studying the code. At least I could present another results from another perl version as you wished.
Regards... |
Stefan
|
you begin bashing the string with a +42 regexp of confusion
|
| [reply] [d/l] [select] |
|
{
my $num = 0;
$main::num = 5; # this instead of the regex
print $num; # prints 0
}
print $num; # prints 5
# or under use strict
print $main::num; # prints 5 as well
Makes perfect sense. However with 5.6.1 you seem to be able to use lexical variables from the enclosing scope, but this is where the bug comes in. It works the first time but doesn't work the next times.
btw, the warning can be ignored in this case
-- Hofmator
| [reply] [d/l] [select] |
Re: Perl Bug in Regex Code Block?
by demerphq (Chancellor) on Sep 03, 2001 at 17:38 UTC
|
Ok, well I am running 5.6.0 AS 623 and I get different output for your code:
0: 12; @counts = (0)
1: 34; @counts = (0)
2: 56; @counts = (0)
@main::counts = (6)
Which says to me that perl is using the dynamic variable inside the regex eval. (Incidentally the docs do say that this is an experimental feature and may not work appropriately. Also they mention localization so I suspect this is maybe intended.) Also if you change the my to a local it produces the desired results. Just ran it on AS 628 and it produces the results you said it did. Although worked as expected under our and local. My money says this is a bug. But this all gets weirder.
When I change the code under (only did this under 623) (barely) to
#!/usr/bin/perl
use strict;
use warnings;
use re qw/eval /;
my $line = 'ab';
my $pattern = q/(.)(?{print ++$counts[0]})^/;
for (0..2) {
my @counts = (0);
print "$_: ";
$line =~ /$pattern/;
print "; my \@counts = (@counts)\n";
}
{
no strict; no warnings;
print "our \@counts = (".join(",",@counts).")\n";
#print "our \@counts = (@counts)\n";
}
I get the same result again. Now uncomment the last print line and run it again. I get aIn string, @counts now must be written as \@counts at .\counts.pl line
+ 22, near "our \@counts = (@counts"
Execution of .\counts.pl aborted due to compilation errors.
Which to me doesnt make any sense at all. It should die in both cases cause the dynamic @counts is not declared, or in neither, but not like this.
And I have another point of weirdness to note in the regex you are using you have placed a '^' caret at the END of the regex, which for some reason makes your print statement fire twice. If I remove the ^ it prints once. Either way I dont see what is going on here at all....
Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM) | [reply] [d/l] [select] |
|
| [reply] [d/l] [select] |
Re: Perl Bug in Regex Code Block?
by MZSanford (Curate) on Sep 03, 2001 at 16:38 UTC
|
I may be confussed, but i think the problem is with the :
my @counts = (0);
... specifically, the my. When you use my, you are creating a variable which will disappear when it goes out of scope. since the for (0..2) {} loop is the current scope, when it completes, @counts is destroyed. This is fixed by not using a my inside of the for loop (as you have seen), and is a very perlish thing.
can't sleep clowns will eat me
-- MZSanford
| [reply] [d/l] [select] |
|
When you use my, you are creating a variable which will disappear when it goes out of scope.
I'm well aware of that ... but the regex is taking place inside this scope and so the lexical variables should be accessible inside the regex. This works the first time as expected but it doesn't work on the second and third iteration of the loop.
Maybe you have misunderstood my question, I'm not confused that the last line of my code doesn't print anything. It was only included for the (working) run with our instead of my. I want to know, why it's changing its behaviour inside the loop.
I hope this clarifies my problem ...
-- Hofmator
| [reply] |
|
|