Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Perl Bug in Regex Code Block?

by japhy (Canon)
on Sep 03, 2001 at 17:22 UTC ( [id://109863]=note: print w/replies, xml ) Need Help??


in reply to Perl Bug in Regex Code Block?

Your regex is only being compiled once, and in this compilation, it makes note of the variable you're using. Thus, it creates an "accidental" closure. Here is my proof:
### update: fixed ### thanks Hof -- I condensed working code poorly :( use re 'eval'; my @r; my $p = q/.(?{ ++$x[0] })^/; for (0..2) { my @x = (0); "ab" =~ $p; push @r, \@x; } print "$_->[0]" for @r;
That code prints 600. If, however, you cause the regex to change, such that it requires recompilation, the binding to the previous @x is gone, and the new @x is bound.

If you were to use qr// instead, you'd be changing the global array.

You're doing some funny-looking scope-crufting. I'd stay away from it if I were you. This situation is the sort of thing I fear having to write about and explain in my book.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Replies are listed 'Best First'.
Re2: Perl Bug in Regex Code Block?
by Hofmator (Curate) on Sep 03, 2001 at 19:41 UTC

    Thanks for the explanation, japhy++! Playing around with your code I think I understand it now and how I accidently created a closure. I still have some questions, though.

    • Why is the regex not recompiled, I'm not using the /o modifier. I thought perl recompiles a regex /$p/ which contains a variable interpolation every time. And is there a way to force a recompile?
    • I was not trying to do anything funny with the different scopes. What I want is execute some code which manipulates a variable in a regex. And I'd like to use a lexical variable so that I don't pollute the global namespace. Is there a way to do that? Taking the my @x declaration out of the loop like this
      my @x; for (0..2) { @x = (0); "ab" =~ $p; push @r, \@x; }
      fixes it here but what if the whole thing is in a subroutine, then I can't call it more than once, can I?
    • I think I'm not wrong in saying that this is slightly underdocumented ... especially since 5.6.0 seems to behave differently as others have posted here.

    -- Hofmator

      Last thing first: it's not documented because code evaluation is experimental. It's a very iffy thing, and it changes quickly and silently.

      Second thing second: use a local array, and copy its contents to a lexical one. I know you don't want to use a global array, but I'm telling you that you should. This is an example from my book:

      "12180" =~ m{ (?{ local @n = () }) (?: (\d) (?{ local @n = (@n, $1) }) )+ \d (?{ @d = @n }) }x;
      We make a local array that things happen to, and then we copy it to our real array at the end of the regex. In your case, you might want to do:
      local @n; /(.)(?{ ++$n[0] })^/; @d = @n;
      First thing last: regex compilation is an interesting thing. Here is code that compiles the regex twice:
      $p = '\w+-\d+'; /$p/; /$p/;
      And here's code that only compiles it once:
      $p = '\w+-\d+'; for $i (1,2) { /$p/ }
      The secret is this (and pertains to regexes with variables in them, for they're not compiled until run-time): for each compilation op-code in the syntax tree, Perl keeps a string representation of the regex. The next time the compilation op-code is gotten to, the NEW string representation is compared with the previous one. If they are the same, the regex doesn't need recompilation. If they are different, it does need to be recompiled.

      Now, if you've heard "if you have a regex, and it has variables in it, and the variables change, the regex has to be recompiled" that's technically incorrect:

      ($x,$y) = ('a+', 'b'); for (1,2) { /$x$y/; ($x,$y) = ('a', '+b'); }
      The two variables comprising our regex have changed, but the regex ends up being the same. Sneaky, eh?

      I can't take credit for figuring this out on my own -- a couple months ago, Dominus gave me the hint about the string representation. Now I understand.

      So that answers your question, I think.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://109863]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found