http://www.perlmonks.org?node_id=428220

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

The following is a gross simplification of a piece of code that almost works--but not quite.

The routine (recurse()) receives a reference to a scalar, inspects it for a condition, and recurses, passing a reference to a substring of it's own parameter, if that condition is met.

When the recursion unwinds, it adds it's modifications to the referent of it's parameter (one or more substrings of which may have been modified earlier in the recusion).

The trace shown below shows that at each level of recurion, the desired modifications are made, but, despite everything being done through aliases and references, the changes are being 'undone' as the recursion unwinds--but not completely. The last change made persists!?

Of particular interest is the value (that should be) returned from the second last level of recursion, which appears to be truncated some how?

Update: ikegami points out that I did not make clear what output I was expecting.

That would be the accumulated changes to the underlying referent. Ie. The last line of output should be:

( a ( a ( a ( a ( aa ) a ) a ) a ) a )
#! perl -slw use strict; sub recurse { print ">> '${ $_[ 0 ] }'"; if( length ${ $_[ 0 ] } ) { recurse( \ substr( ${ $_[ 0 ] }, 1, -1 ) ); ${ $_[ 0 ] } = " ( ${ $_[ 0 ] } ) "; } print "<< '${ $_[ 0 ] }'"; return; } my $str = 'aaaaaaaaaa'; recurse \$str; print $str; __END__ P:\test>junk >> 'aaaaaaaaaa' >> 'aaaaaaaa' >> 'aaaaaa' >> 'aaaa' >> 'aa' >> '' << '' << ' ( aa ) ' << ' ( aaaa ) ' << ' ( aaaaaa ) ' << ' ( aaaaa' << ' ( a ( aaaaaaaa ) a ) ' ( a ( aaaaaaaa ) a )

Can anyone explain what is going on here--or better--make it work?


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re: Scalar refs, aliasing, and recursion weirdness.
by MarkusLaker (Beadle) on Feb 04, 2005 at 23:49 UTC
    perlfunc's entry on substr says:

    If the lvalue returned by substr is used after the EXPR is changed in any way, the behaviour may not be as expected and is subject to change. This caveat includes code such as print(substr($foo,$a,$b)=$bar) or (substr($foo,$a,$b)=$bar)=$fud (where $foo is changed via the substring assignment, and then the substr is used again), or where a substr() is aliased via a foreach loop or passed as a parameter or a reference to it is taken and then the alias, parameter, or deref'd reference either is used after the original EXPR has been changed or is assigned to and then used a second time.

    I had to read that last sentence several times to understand it, but I think it covers what your code does: you've taken a reference to a substr, changed the referee string, taken a reference to a substring of that referee, and changed the referee of that. Don't do that.

    Here's some code that does what you want:

    #!/usr/bin/perl -l use warnings; use strict; sub recurse($); sub recurse($) { my $a = $_[0]; print ">> '$a'"; $a =~ s/ ^ (.) (.+) (.) $ / join '', $1, recurse $2, $3 /ex; print "<< '$a'"; " ( $a ) "; } print "Result: ", recurse 'aaaaaaaaaa';

    Markus

    Update: diotalevi pointed out that my description of the problem with the OP's code was wrong. I've fixed it.

      Thanks. I think you are right regarding the re-use of a modified lvalue ref.

      Whilst your code achieves the notionally "desired" output of the testcode I posted, as I said, this is a simplification.

      The real application could be recursing into multiple substrings of the parameter at each level. Where and when the recursion occurs is dependant upon the content of the (sub)string passed and cannot be easily codified into a regex. For various reasons I wish to avoid using the regex engine also.

      That said, I may not have those choices now.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        Still probably not what you are looking for (i.e. no refs), but closer maybe? (at least no regex)
        #!/usr/bin/perl -slw use strict; my $str = 'aaaaaaaaaa'; print recur($str); sub recur { my $s = shift; return "" if length($s)==0; substr($s,1,-1) = recur(substr($s,1,-1)); return "($s)"; }


        -- All code is 100% tested and functional unless otherwise noted.

      Marcus, you are incorrect when you summarize the documentation as noting that modifying a reference taken to a substring is forbidden. That's wrong. You can do that to a reference once, but not two or more times.

      $_ = 'aaaaaaaaaa'; $_ = \ $_; $$_ =~ s/ ... / ... /; # This is ok. $$_ =~ s/ ... / ... /; # This is not.

      Modifying substring lvalues is typically safer if you don't do something to persist the lvalue beyond the length of a statement - you're less likely to find yourself accidentally modifying it more than once. The following expression is a highly useful form of lvalue substrings and one that people should be more aware of. It would not be available if lvalue substrings were not available.

      substr( ... ) =~ s/ ... / ... /g
        diotalevi writes:

        Marcus, you are incorrect when you summarize the documentation as noting that modifying a reference taken to a substring is forbidden. That's wrong.

        You're right. I expressed myself so badly that what I wrote was factually wrong. I'll update my original response. Thanks for pointing out my mistake.

        Incidentally, though, as long as no substr is involved, you can run as many substitutions as you wish on an indirected reference. Hence:

        [~/perl/monks]$ ./test abc [~/perl/monks]$ cat test #!/usr/bin/perl -l use warnings; use strict; my $a = 'aaa'; $_ = \$a; $$_ =~ s/aa/ab/; $$_ =~ s/ba/bc/; print $$_; [~/perl/monks]$

        That $_ = \$_ construction is interesting. It yields a variable for which $_ == $$_ == $$$_, etc. No matter how many times you indirect it, the type and value of $_ don't change. There must be some code that breaks when you do that!

        Markus

        This proved to be the final piece in my puzzle. Instead of passing an lvalue ref to a selected substring into deeper levels of recursion, I have to pass the (aliased) target string, and the start/end pair of teh selectd substring. The deeper level can then use substr to modify the appropriate bit of the target without falling into the trap of re-using a modified lvalue ref.

        Passing the three salient pieces of information around separately is less convenient that doing so nicely encapsulated in the lvalue ref, and it forces me to do the math of combining the start-end pair passed at a given level with the start-end pair selected from within it, before passing them in deeper.

        It also forces me to manipulate the start/end pair I receive to account for any shrinkage or growth of the selection made at this level or any deeper levels called from this level--within my caller.

        Perfectly doable at this level, but it would also be perfectly doable--and more efficient and convenient--if Perl did that for me. I've no doubt that it could be done by Perl given the interest of someone with sufficient tuits at that level.

        Perhaps the most diconserting thing about this whole thread is that Perl is silently converting an lvalue ref to a normal scalar when a second modification through it is attempted, and thus discarding the changes made by that second change!

        That ought to be a red-flag. Maybe it should be the subject of a perlbug?


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.

        It seems that someone already did offer a patch that would perpetuate the lvalueness of an LVALUE ref when it is modifed. Indeed, it was done in response to a perlbug diotalevi raised following an earlier exposition of mine on the subject here.

        However, it would appear that patch was rejected in favour of the quick fix of (silently) converting a modified LVALUE to a mortal SV if a further attempt to modify it was encountered.


        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.
Re: Scalar refs, aliasing, and recursion weirdness.
by !1 (Hermit) on Feb 04, 2005 at 23:18 UTC

    Rather interesting behavior. If you merely use @_'s magic aliasing, the results seem even stranger.

    #! perl -slw use strict; sub recurse { print \$_[0]; print ">> '$_[ 0 ]'"; if( length $_[ 0 ] ) { recurse( substr( $_[ 0 ] , 1, -1 ) ); $_[0] = " ( $_[0] ) "; } print "<< '$_[0]'"; return; } my $str = 'abcdefghij'; recurse $str; print $str; __END__ SCALAR(0x8126a70) >> 'abcdefghij' LVALUE(0x8126bc0) >> 'bcdefghi' SCALAR(0x812a454) >> 'cdefgh' SCALAR(0x812a520) >> 'defg' SCALAR(0x812a5ec) >> 'ef' SCALAR(0x812a6b8) >> '' << '' << ' (' << ' ( d' << ' ( c (' << ' ( b ( c' << ' ( a ( b ( c ( ) d ) ( ) ef ) ghij ) ' ( a ( b ( c ( ) d ) ( ) ef ) ghij )

    Perhaps this is a bug in the magic lvalue-ness of substr? It doesn't seem to be verifying the length of an lvalue on an lvalue, thus the shifting of the terms that appear on the right.

      I think you may have hit the nail on the head.

      It would appear that if you modify an lvalue ref, it gets converted to a normal scalar before the modification.

      Which is annoying and counter-intuative (to me anyway), but maybe work that would be involved in allowing the underlying scalar to be modified through a substring of an lvalue ref is too complicated?

      Maybe I will have to code my own lvalue-ref-on-steroids to achieve my purpose?


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.

        Even better example:

        #! perl -slw use strict; sub recurse { print ">> '${ $_[ 0 ] }'"; (my $depth=$_[1])--; if ($depth) { recurse( \ substr( ${ $_[ 0 ] }, 1, -1 ), $depth ); } else { ${$_[0]} = "XX"; } print "<< '${ $_[ 0 ] }'"; return; } my $str = 'abcdefghi'; recurse \$str, 3; print $str; recurse \$str, 2; print $str; __END__ >> 'abcdefghi' >> 'bcdefgh' >> 'cdefg' << 'XX' << 'bcdefgh' << 'abcdefghi' abcdefghi >> 'abcdefghi' >> 'bcdefgh' << 'XXi' << 'aXXi' aXXi
Re: Scalar refs, aliasing, and recursion weirdness.
by Zaxo (Archbishop) on Feb 04, 2005 at 23:27 UTC

    I think that this is because the position and length arguments of lvalue substr not being reevaluated after an assignment which changes the length. It probably caused no trouble so laong as it is working in place, but makes only the old length of characters be copied when it copies.

    We've seen this berore when talking about lvalue subs! ;-)

    After Compline,
    Zaxo

      We've seen this before when talking about lvalue subs! ;-)

      Yes. We have haven't we. It has been a recurrent theme of ours.:)

      Though they fixed the original problem of any one sourcecode statement only having a single lvalue ref that got reused each time that statement was reexecuted.

      I thought that had fixed the problems--but maybe my notion of what an lvalue ref should do is different from what the authors envisaged?


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
Re: Scalar refs, aliasing, and recursion weirdness.
by ikegami (Pope) on Feb 04, 2005 at 22:43 UTC

    I don't know why, but on a hunch, I tried this change:

    #! perl -slw use strict; sub recurse { print ">> '${ $_[ 0 ] }'"; if( length ${ $_[ 0 ] } ) { #recurse( \ substr( ${ $_[ 0 ] }, 1, -1 ) ); #old recurse( \( my $boo = substr( ${ $_[ 0 ] }, 1, -1 ) ) ); #new ${ $_[ 0 ] } = " ( ${ $_[ 0 ] } ) "; } print "<< '${ $_[ 0 ] }'"; return; } my $str = 'aaaaaaaaaa'; recurse \$str; print $str;

    and it worked. I leave it up to you to explain it (cause I'm going home!)

    I'm don't think this is the output you want, because it would be a very convoluted way of getting it. But you didn't specify what you wanted.

      and it worked.

      Actually not--unless your results are different to mine?

      It only retains the last change and I want to retain the accumulated changes.

      P:\test>junk >> 'aaaaaaaaaa' >> 'aaaaaaaa' >> 'aaaaaa' >> 'aaaa' >> 'aa' >> '' << '' << ' ( aa ) ' << ' ( aaaa ) ' << ' ( aaaaaa ) ' << ' ( aaaaaaaa ) ' << ' ( aaaaaaaaaa ) ' ( aaaaaaaaaa )

      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        Maybe you want
        ${ $_[ 0 ] } = " ( $boo ) ";
        or

        if( length ${ $_[ 0 ] } > 2) { my ($pre, $mid, $post) = /^(.)(.*)(.)$/; recurse( $mid ); ${ $_[ 0 ] } = "$pre ( $mid ) $post"; }

        Just guessing, cause you didn't specify what output you want.