Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

'Dynamic scoping' of capture variables ($1, $2, etc.)

by AnomalousMonk (Monsignor)
on Dec 16, 2012 at 20:03 UTC ( #1009091=perlquestion: print w/ replies, xml ) Need Help??
AnomalousMonk has asked for the wisdom of the Perl Monks concerning the following question:

This question is prompted by the discussion of $1 not "freezing" in an addition.

perlre 5.14.2 (in the sub-section 'Capture groups') sez (emphases added):

Capture group contents are dynamically scoped and available to you
outside the pattern until the end of the enclosing block or until the
next successful match, whichever comes first. (See "Compound Statements"
in perlsyn.) You can refer to them by absolute number (using "$1" ...

(The reference in the quote above to the discussion in Compound Statements) does not seem to shed any light on the particular question of this post.

In the code example below, some dynamic scoping is clearly taking place since $1 begins and ends undefined. However, $1, set to '1' by the last successful match at the lowest-but-one level of recursion, is propagated upward unchanged through several levels of subroutine 'blocks' (as I understand them) until it exits the topmost. (local-izing $1 within the subroutine has no effect on this behavior.)

Might this behavior have something to do with the recursive nature of the subroutine: the compiler rewrites the recursive call as a branch to a point within the same block, and so $1 is only restored once because there is only one real block exit?

Can anyone throw any light on this? In particular, any links to documentation?

>perl -wMstrict -le "$_ = 'x55x666x7777x1x'; ;; print 'before: $1 is ', defined($1) ? qq{'$1'} : 'undefined'; print R(); print 'after: $1 is ', defined($1) ? qq{'$1'} : 'undefined'; ;; sub R { printf qq{ \$_ is '$_'}; printf qq{ \$1 is %s \n}, defined($1) ? qq{'$1'} : 'undefined'; return s/(\d+)// ? $1 + R() : 0; } " before: $1 is undefined $_ is 'x55x666x7777x1x' $1 is undefined $_ is 'xx666x7777x1x' $1 is '55' $_ is 'xxx7777x1x' $1 is '666' $_ is 'xxxx1x' $1 is '7777' $_ is 'xxxxx' $1 is '1' 4 after: $1 is undefined

Updates:

  1. Just in case the behavior above was an artifact of $1 being undefined initially, I tried setting $1 to a defined value via a successful match prior to the print $1/print R()/print $1 sequence. The result is no different: the value $1 starts out with at the 'top' level is the one it winds up with.
  2. I should mention I am running all my example code in this and other postings in this thread under Strawberry 5.14.2.1.

Comment on 'Dynamic scoping' of capture variables ($1, $2, etc.)
Select or Download Code
Re: 'Dynamic scoping' of capture variables ($1, $2, etc.)
by moritz (Cardinal) on Dec 16, 2012 at 20:42 UTC

    What you are observing is exactly what dynamic scoping is about: Code called from where the variable is defined can see it, even though it's not in the same lexical scope.

    This can be demonstrated without recursion:

    use 5.010; use strict; use warnings; sub sayit { say $1 // 'undef'; } do { '42' =~ /(\d+)/ and sayit(); }; sayit(); __END__ 42 undef

    Subroutine sayit reads $1 outside of the lexical scope of the block where $1 is set. But since it's in the dynamic scope, it can still see the value.

    The sayit call outside the block prints undef\n, which demonstrates that $1 isn't merely a global variable.

    As to your actual question:

    However, $1, set to '1' by the last successful match at the lowest-but-one level of recursion, is propagated upward unchanged through several levels of subroutine 'blocks'

    I see no evidence for that. After $1 is set to '1', exactly one more recursive call happens, and there it is printed out. Then the recursion ends, and you don't print $1 anymore.

    Update: after experimenting a bit, I can provide evidence of the phenomen you mentioned:

    sub R { printf qq{before: \$_ is '$_'}; printf qq{ \$1 is %s \n}, defined($1) ? qq{'$1'} : 'undefined'; s/(\d+)// ? $1 + R() : 0; printf qq{after: \$_ is '$_'}; printf qq{ \$1 is %s \n}, defined($1) ? qq{'$1'} : 'undefined'; } $_ = 'x55x666x7777x1x'; R(); __END__ before: $_ is 'x55x666x7777x1x' $1 is undefined before: $_ is 'xx666x7777x1x' $1 is '55' before: $_ is 'xxx7777x1x' $1 is '666' before: $_ is 'xxxx1x' $1 is '7777' before: $_ is 'xxxxx' $1 is '1' after: $_ is 'xxxxx' $1 is '1' after: $_ is 'xxxxx' $1 is '1' after: $_ is 'xxxxx' $1 is '1' after: $_ is 'xxxxx' $1 is '1' after: $_ is 'xxxxx' $1 is '1'

    This is because there is only one variable $1. It is dynamically scoped, so once it is set in an inner scope, an outer scope sees the modification too.

      [Emphases added.]
      As to your actual question:
      However, $1, set to '1' by the last successful match at the lowest-but-one level of recursion, is propagated upward unchanged through several levels of subroutine 'blocks'
      I see no evidence for that. After $1 is set to '1', exactly one more recursive call happens, and there it is printed out. Then the recursion ends, and you don't print $1 anymore.

      I don't print $1, but it must be '1' at all higher recursion levels because only that value will result in a sum total of 4.

      However, I have not had (and will not immediately have) a chance to ponder your link and other responses.

      I suppose my chief confusion stems from the fact that $1 starts out undefined, takes on a bunch of other values, then winds up as it started. If it had finished up as '1', my mind would rest a bit easier, but at some point, and only one point mind you, it's leaving some kind of scope and being restored to the value it had upon entry. I could understand all scopes, I could understand none, but I can't (yet) understand just one!

        You are right be worried, your example code is just a bit to complex to make it evident at first glance.

        There is no logical reason why nested calls of the same function (i.e. recursions) should act differently to nested calls of different functions.

        see updated code, especially the second paragraph contrasting the bug.

        Cheers Rolf

        UPDATE:

        to be sure to avoid any side effects from eval within the debugger here a standalone file for testing:

        use warnings; use strict; use 5.10.0; my $x; sub delchar { $x =~ s/(\w)// ? $1 . delchar() . $1 : "x" } $x='abc'; say delchar(); # => "cccxccc" sub del1 { $x =~ s/(\w)// ? $1 . del2() . $1 : "x" } sub del2 { $x =~ s/(\w)// ? $1 . del3() . $1 : "x" } sub del3 { $x =~ s/(\w)// ? $1 . del4() . $1 : "x" } sub del4 { $x =~ s/(\w)// ? $1 . del5() . $1 : "x" } $x='abc'; say del1(); # => "abcxcba"

        UPDATE:

        Best practice is to copy captures like $1 ASAP! (not only in recursions)

        DB<105> sub delchar { my $m; $m = $1 if $x =~ s/(\w)//; $m ? $m . +delchar() . $m : "-" } => 0 DB<106> $x='abc'; delchar() => "abc-cba" DB<107> sub delchar { local $m; $m = $1 if $x =~ s/(\w)//; $m ? $m + . delchar() . $m : "-" } => 0 DB<108> $x='abc'; delchar() => "abc-cba"

      moritz:
      Really gotta go now, but just one more experiment: without recursion (actually, along the lines LanX has already posted).

      >perl -wMstrict -le "$_ = 'x77x888x1x'; ;; 'zonk' =~ m{(\w+)}xms; ;; print 'before: $1 is ', defined($1) ? qq{'$1'} : 'undefined'; print X1(); print 'after: $1 is ', defined($1) ? qq{'$1'} : 'undefined'; ;; sub X1 { s/(\d+)//; print qq{ '$1'}; return $1 + X2(); } sub X2 { s/(\d+)//; print qq{ '$1'}; return $1 + X3(); } sub X3 { s/(\d+)//; print qq{ '$1'}; return $1; } " before: $1 is 'zonk' '77' '888' '1' 966 after: $1 is 'zonk'

      Without recursion, value of $1 seems to be scoped to the subroutine 'block'.
      No difference in results for  Xn() + $1 versus  $1 + Xn() or for $1 being undefined initially.

      For clear definition, as $_ value got changed in every recursive call of R() function, $1 value also got changed and in return of every recursive call, the last changed value available and that is the reason for getting '1' 5 times in $1. For better understanding, here below I have shown how that recursive calls and values of $_ and $1 will be.

      Initial ( First ) callback of R() function:
      $_='x55x666x7777x1x'; $_=undefined;
      Inside code, after evaluating s/(\d+)// ? $1 + R() : 0;
      $1=55; #immediately after executing s// command
       Second callback of R():
       $_='xx666x7777x1x'; $1=55;
       Inside code, after evaluating s/(\d+)// ? $1 + R() : 0;
       $1=666; #immediately after executing s// command
        Third callback of R():
        $_='xxx7777x1x'; $1=666;
        Inside code, after evaluating s/(\d+)// ? $1 + R() : 0;
        $1=7777; #immediately after executing s// command
         Fourth callback of R():
         $_='xxxx1x'; $1=7777;
         Inside code, after evaluating s/(\d+)// ? $1 + R() : 0;
         $1=1; #immediately after executing s// command
          Fifth callback of R():
          $_='xxxxx'; $1=1;
          Inside code, after evaluating s/(\d+)// ? $1 + R() : 0;
          # no callback executed as ( there are no digits ) pattern not matched
          $1=undefined; #immediately after executing s// command
          # after prints 'xxxxx' and '1' for fifth callback
         # after prints 'xxxxx' and '1' for fourth callback
        # after prints 'xxxxx' and '1' for third callback
       # after prints 'xxxxx' and '1' for second callback
      # after prints 'xxxxx' and '1' for first callback


      Kindly regret me for format. I unable to get exact stack of recursive calls for presentation.

        It's a good guess, because there is a the fact that $1 points into the original string, and can change when the original string changes. But it's not the source of the confusion here.

        You can see that by changing the original code to use m/(\d+)/g instead of s/(\d+)//, thus not modifying the original string at all:

        sub R { printf qq{before: \$_ is '$_'}; printf qq{ \$1 is %s \n}, defined($1) ? qq{'$1'} : 'undefined'; m/(\d+)/g ? $1 + R() : 0; printf qq{after: \$_ is '$_'}; printf qq{ \$1 is %s \n}, defined($1) ? qq{'$1'} : 'undefined'; } $_ = 'a81b2d34c1'; R(); __END__ before: $_ is 'a81b2d34c1' $1 is undefined before: $_ is 'a81b2d34c1' $1 is '81' before: $_ is 'a81b2d34c1' $1 is '2' before: $_ is 'a81b2d34c1' $1 is '34' before: $_ is 'a81b2d34c1' $1 is '1' after: $_ is 'a81b2d34c1' $1 is '1' after: $_ is 'a81b2d34c1' $1 is '1' after: $_ is 'a81b2d34c1' $1 is '1' after: $_ is 'a81b2d34c1' $1 is '1' after: $_ is 'a81b2d34c1' $1 is '1'

        Let me repeat what I wrote earlier, but hopefully a bit clearer this time: There is only one variable $1. The first call to m/()/ or s/()// creates the dynamic variable $1, and all subsequent calls modify the existing variable $1. Since there is no mechanism for resetting $1 to a previous value, you can see the last value of $1 in all stack frames that have access to it.

Re: 'Dynamic scoping' of capture variables ($1, $2, etc.) (localization of captures in recursions is buggy!)
by LanX (Abbot) on Dec 16, 2012 at 21:03 UTC
    I understand your problem, for my understanding "dynamically scoped" should mean localized such that the sum at the end shouldn't be 4=1+1+1+1 but 1+7777+666+55.

    but if you read the docs differently

    Capture group contents are dynamically scoped and available to you
    outside the pattern until the end of the enclosing block or until the
    next successful match, whichever comes first.
    

    you can see that the next match within the recursion sets $1 to another value.

    At least that's ATM my understanding of the designers intention.

    > a branch to a point within the same block

    no, no tail-head optimization in Perl!

    I will try to run some experiments tomorrow.

    UPDATE1:

    hmm, w/o recursion and with different strings it works like localized:

    DB<112> sub tst2 { 'a' =~ /(\w)/; $1. tst(). $1 } => 0 DB<113> sub tst { 'b' =~ /(\w)/; $1} => 0 DB<114> 'c' =~ /(\w)/; print $1. tst2(). $1 => 1 cabac

    UPDATE2

    you're right there is a bug in how recursive functions are handled:

    DB<123> sub delchar { $x =~ s/(\w)// ? $1 . delchar() . $1 : "x" } => 0 DB<124> $x='abc' => "abc" DB<125> delchar() => "cccxccc"

    in contrast explicitly different functions:

    DB<138> sub del1 { $x =~ s/(\w)// ? $1 . del2() . $1 : "x" } => 0 DB<139> sub del2 { $x =~ s/(\w)// ? $1 . del3() . $1 : "x" } => 0 DB<140> sub del3 { $x =~ s/(\w)// ? $1 . del4() . $1 : "x" } => 0 DB<141> sub del4 { $x =~ s/(\w)// ? $1 . del5() . $1 : "x" } => 0 DB<142> $x='abc'; del1() => "abcxcba"

    Perl 5.10

    UPDATE3

    the following shows that $1 is static per function, maybe they where implemented like closures.

    DB<111> sub del2 { $x =~ s/(\w)// ? $1 . del1() . $1 : "-" } => 0 DB<112> sub del1 { $x =~ s/(\w)// ? $1 . del2() . $1 : "-" } => 0 DB<113> $x='abcd'; del1() => "cdcd-dcdc"

    Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1009091]
Approved by moritz
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (13)
As of 2014-04-16 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (429 votes), past polls