http://www.perlmonks.org?node_id=1076954

smls has asked for the wisdom of the Perl Monks concerning the following question:

Just when I thought nothing could surprise me anymore in Perl, I came across a case in my code that defies the order in which I thought expressions were evaluated. It is demonstrated by the following self-contained program:

use feature qw(say state); my @widths = (2, 6, 5, 7); my @partitions = map { state $c = 0; [$c, $c += $_] } @widths; say '[', join(', ', @$_), ']' for @partitions;

I would have expected that to print:

[0, 2] [2, 8] [8, 13] [13, 20]

But it actually prints:

[2, 2] [8, 8] [13, 13] [20, 20]

... i.e. even though the first $c in the final map statement comes logically (i.e. evaluation-order-wise) before the $c += $_ operation, it is substituted with the value of $c after that operation.

Can someone explain to me why this happens? What am I missing?

Replies are listed 'Best First'.
Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by Eily (Monsignor) on Mar 04, 2014 at 21:23 UTC

    Edit: ikegami is right, I mistook precedence for evaluation order, and just kept being blinded by that mistake. Please read his correct explanation.

    If you look at the precedence list in perlop, += is of higher precedence than ,, so it will be run first, just like the multiplication will be run before the addition in $c+$c*$_

    In $c+0, $c+=1; + is the highest precedence operation, so addition is indeed run first. In $tmp = $c, $c+=1; = and += are of equivalent precedences, and so are run from left to right (and that's because of the , in between).

    Edit: added a ; here and a ; there, because their absence made my post look confusing.

      Hm, no, I don't think that explains it.

      Operator precedence requires, of course, that the += operator is evaluated before the , operator is evaluated, but it does not explain why the += operator is evaluated before the first argument to the , operator is evaluated.

      And indeed, with other operators (i.e. other than ++ or += and friends), this does not happen. For example, . also has higher precedence than , but it does not cause the second decorate() call to happen before the first in the following example:

      use feature qw(say state); sub decorate { state $counter = 0; return ++$counter . ':' . shift } my @a = ( decorate("foo"), decorate("bar") . "!" ); say "@a"; # prints "1:foo 2:bar!" and not "2:foo 1:bar!"

      If I correctly understand the perlop paragraph quoted by Eily below, it appears that auto-increment operators only participate in the normal evaluation order as far as their return value is concerned, but their side-effect (modifying the variable) happens at an undefined time.

      I really wonder why that is the case, though. Just as a function's side-effects happen when the function call is evaluated (from the point of view of the larger expression), I would have expected the side-effect of += to happen when it's the operator's turn to be evaluated in the evaluation order of the larger expression.

        That's because decorate has an even higher precedence (on the left of any operator except commas, and unless parenthesis are involved) so what happens is actually:

        (decorate("foo"), decorate("bar").'!') ((return $decorated_foo), (return $decorated_bar.'!')) ((return $decorated_foo), (return $decorated_bar_with_exclamation_mark +)) ($decorated_foo, (return $decorated_bar_with_exclamation_mark)) ($decorated_foo, $decorated_bar_with_exclamation_mark)
        (This is of course, not actual code, but just a representation)
        So first the calls to decorate are resolved, then the concatenation, and at last the values are added to the list. But the concatenation does not happen last.

        Edit: "removed" a bit about precedence being higher on the left of some operators, because it's late, and I'm not sure about what I'm saying.

      Yep, you're absolutely right Eily, thanks for clarifying this! (ironically, I know that precedence table by heart, and have for some 15 odd years :-)

      That's completely wrong.

      Precedence indicates where implied parentheses are located. Precedence mere dictates that $c, $c += 1 is short for $c, ($c += 1) rather than ($c, $c) += 1. Precedence is not relevant.

      Operand evaluation order dictates which of $c and $c += 1 is evaluated first. Since operand evaluation order is left-to-right for the list/comma operator, $c is evaluated before $c += 1, so you got the order wrong too.

      You can see the LTR order in the compiled code:

      >perl -MO=Concise,-exec -e"my @a = ( $c, $c += 1 );" 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gvsv[*c] s # $c 5 <#> gvsv[*c] s # $c + 1 6 <$> const[IV 1] s # 7 <2> add[t4] sKS/2 # 8 <0> pushmark s 9 <0> padav[@a:1,2] lRM*/LVINTRO a <2> aassign[t5] vKS/COMMON b <@> leave[1 ref] vKP/REFC -e syntax OK

      You can see the LTR order in this example:

      my $c = 4; sub c :lvalue { print("$_[0]\n"); $c } my @a = ( c("a"), c("b") += 1 ); print("@a\n");

      outputs

      a b 5 5

        Yes you're right. I wish I hadn't been so blind about that confusion. I updated my post, quite late I'm afraid.

Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by Laurent_R (Canon) on Mar 04, 2014 at 21:25 UTC
    Even simpler:
    my $c = 2; my @a = ($c, $c+=1); print "@a"; # prints 3 3
    Similarly:
    my @a = ($c, ++$c); # @a is now (3, 3)
    But:
    @a = ($c, $c++) # @a is now (3, 2) !!!
    Do you get it? In the first two cases, the value of $c is modified before the array gets populated. In the last case, because of the use of the post-increment operator, $c is incremented only after the second value is stored into the array, but before the first value is stored into the array.
      Using the same variables multiple times on the same line that you use pre and post increment on is fun to play with, but never depend on the order of operations.

      $c is incremented only after the second value is stored into the array,

      That's not true. The order in which things happen:

      1. $c
      2. $c
      3. post-increment
      4. my @a
      5. list assign

      @a doesn't even exist at the point you say values are assigned to it.

      >perl -MO=Concise,-exec -e"my @a = ( $c, $c++ );" 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <0> pushmark s 4 <#> gvsv[*c] s # $c 5 <#> gvsv[*c] s # $c 6 <1> postinc[t4] sK/1 # post-increment 7 <0> pushmark s 8 <0> padav[@a:1,2] lRM*/LVINTRO # my @a 9 <2> aassign[t5] vKS/COMMON # list assignment a <@> leave[1 ref] vKP/REFC -e syntax OK
      Well, but what about this? It does not look like your third example worked...
      $ perl -wlE 'my $x=1; my @A=($x, $x++); say join $/, @A;' 2 1

        I believe this falls into the undefined behaviour of auto increment operator :

        Note that just as in C, Perl doesn't define when the variable is incremented or decremented. You just know it will be done sometime before or after the value is returned. This also means that modifying a variable twice in the same statement will lead to undefined behavior.
        So because of the precedence list, you know the right part will be run first, and the left part after that, but when exactly the variable will be incremented is left to Perl's implementation.

        Sorry, you are right, I originally made a mistake when copying the values, my third example gives (3, 2), not (2, 3) as I originally typed by mistake. I almost immediately corrected it, but it appears you had time to see it before I corrected my error, even though I corrected within 2 minutes of the original post.
Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by ikegami (Patriarch) on Mar 06, 2014 at 15:32 UTC
    The relevant code is my @a = ( $c, $c += 1 );. It does the following:

    1. $c is put on the stack. (The scalar, not its value.)
    2. $c is incremented.
    3. $c is put on the stack.
    4. An array is created.
    5. For each of the elements on the stack,
      1. A copy is made an placed at the end of the array.

    As you can see, $c has already been incremented by the time you assign it to the array (twice). It's unwise to modify a variable in the same expression as you read it.

    You'll get the same result from

    use Data::Alias qw( alias ); my @stack; alias push @stack, $c; alias push @stack, $c += 1; my @a = splice(@stack);

    Now consider my @a = ( $c + 0, $c += 1 );. It does the following:

    1. A new scalar is created from the result of the addition of the value of $c and zero.
    2. It's placed on the stack.
    3. $c is incremented.
    4. $c is put on the stack.
    5. An array is created.
    6. For each of the elements on the stack,
      1. A copy is made an placed at the end of the array.
    use Data::Alias qw( alias ); my @stack; alias push @stack, $c + 0; alias push @stack, $c += 1; my @a = splice(@stack);

    This places the original and the new value of $c in the array.

Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by hazylife (Monk) on Mar 04, 2014 at 20:58 UTC

    Very strange indeed. And it doesn't even have to be a state variable, i.e.

    my $c = 0; my @partitions = map { [$c, $c += $_] } @widths;

    gives the same result.

    Whereas [$c+0, $c += $_] produces

    [0, 2] [2, 8] [8, 13] [13, 20]
    and so does [my $tmp = $c, $c += $_]

      Wild guess here but since the anonymous array is on the right side of a list operator, then the comma has really low precedence -- lower than assignment. By adding the +0, you give both terms equal precedence so it goes back to leftward precedence? Well ... that's my guess.

      -derby

      update: what Eily says makes better sense given the simple example of

      use Data::Dumper; my $c = 0; my $d = [ $c, $c += 2 ]; print Dumper( $d );

Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by Eily (Monsignor) on Mar 04, 2014 at 22:27 UTC

    This raises another question though: this is fairly simple and understandable (and in this case, Perl doesn't DWIM), how should it be written instead?

    $c_old = $c; $c+=$_; [$c_old, $c]; is quite cumbersome. ($c = $c_old = $c) += $_; is just messing with peoples' head;. So I thought "Well, the unary + has a high precedence, and it's actually just the identity operator, this should work just fine":

    perl -MData::Dumper -E 'say join ", ", +$c,$c+=1 for 1..3'

    1, 1 2, 2 3, 3
    I thought I had missed something, but because I did have an idea of what was happening I thought "well, I'll just try it with two - instead"

    perl -MData::Dumper -E 'say join ", ", --$c,$c+=1 for 1..3'

    0, 0 0, 0 0, 0
    Yup, stupid, -- is not two unary -, but one auto-decrement operator. So:

    perl -MData::Dumper -E 'say join ", ", - -$c,$c+=1 for 1..3'

    0, 1 1, 2 2, 3
    Talking about clear code ... So I guess hazylife's version is actually the best: $c+0,$c+=$_; with a comment at the end so that the next person reading the code does not think "Oh, +0, now that's silly, let's just remove it!"

    But still, +$c should work shouldn't it? What happens is obvious, perl just "optmises" it away as soon as it sees it. That's still an inconsistency, because you can't just replace a unary - with a unary + and expect the things to happen in the same order. I could only try that in perl v5.14, but I suppose it's still the case in later versions, could someone try it?

      Looks the same here, with Perl 5.18.1 on Linux:

      perl -MData::Dumper -E 'say join ", ", +$c,$c+=1 for 1..3'

      1, 1 2, 2 3, 3

      perl -MData::Dumper -E 'say join ", ", --$c,$c+=1 for 1..3'

      0, 0 0, 0 0, 0

      perl -MData::Dumper -E 'say join ", ", - -$c,$c+=1 for 1..3'

      0, 1 1, 2 2, 3
Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by Jenda (Abbot) on Mar 05, 2014 at 10:01 UTC

    1. Modifying a variable you use several times within an expression is begging for problems. Don't!

    2. You can't use a state variable for something like this! As soon as that line gets evaluated twice, you end up in deep sh^B^Bproblems:

    use feature qw(say state); my @widths = (2, 6, 5, 7); foreach (1 .. 2) { my @partitions = map { state $c = 0; [$c, $c += $_] } @widths; say '[', join(', ', @$_), ']' for @partitions; print "\n"; }

    State variables are too global. The simplest solution I can think of is:

    use feature qw(say); my @widths = (2, 6, 5, 7); foreach (1 .. 2) { my @partitions = do { my $c = 0; map { $c += $_; [$c - $_, $c] } @ +widths}; say '[', join(', ', @$_), ']' for @partitions; print "\n"; }
    or
    use feature qw(say); my @widths = (2, 6, 5, 7); foreach (1 .. 2) { my @partitions = do { my $c = 0; map { my $old = $c; [$old, $c+=$_ +] } @widths}; say '[', join(', ', @$_), ']' for @partitions; print "\n"; }

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: Why does the first $c evaluate to the incremented value ... (bug)
by oiskuu (Hermit) on Mar 05, 2014 at 00:23 UTC

    In C, a comma operator is a "sequence point", meaning a statement such as y = (++x, ++x, ++x); is perfectly valid. (But (++x + ++x); is undefined behavior.) Comma in declaration list, initializers, parameter list, ..., is not an operator but a punctuator.

    perlop says a comma in list context is "just the list argument separator". This is not a case with operator precedence. Indeed, it's a nasty bug!

    perl -le '$c = 0; print @$_ for map { [$c, $c += $_] } (2,6,5,7);' perl -le '$c = 0; print @$_ for map { ["$c", $c += $_] } (2,6,5,7);'
Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by vsespb (Chaplain) on Mar 05, 2014 at 11:47 UTC
    I think it's not about precedence.
    my $c = 2; my @a = ("$c", $c+=1); print "@a";
    prints 2, 3

    and
    my $c = 2; my @a = ($c, $c+=1); print "@a";
    prints 3, 3

    Expressions in lists separated by comma guaranteed to be executed in right order, otherwise
    my ($a, $b) = (shift, shift)
    would not work
    I think problem that some aliasing happening before assigment to @a

      Yes. Exactly.

      It is entertaining when this question comes up, again and again (to see all of the same wrong guesses and conflating Perl and C).

      - tye        

        I understand why aliasing involved when calling function
        mysub($c, $c+=1)
        But why it's also involved when composing a list?
        @a = ($c, $c+=1)
        for consistency with function calls? Where is documented?

      Well, quote-like operators have higher precedence than +=, so your example can still be explained by the "sub-expressions with higher-precedence operators are evaluated first" rule.

      And in the last example, both shift's have the same precedence, so it goes from left to right "as a last resort".

      Although, come to think of it, terms are technically supposed to have highest precedence. Yet for some reason, term subexpressions are evaluated last (see my demonstration using tied scalars above).

      All things taken together, it seems the rules for expression evaluation order in Perl can be described like so:

      1. An expression is viewed as a tree whose branch nodes are operators and whose leaf nodes are terms, constructed in accordance with any grouping parenthesis and the precedence of the involved operators.
      2. It follows that sub-expressions which appear as "sibling nodes" in that tree are independent of each other, and if they are all side-effect-free, the order of their evaluation relative to each other is irrelevant. However, if they do have side-effects, it becomes significant. That's where rule 3 comes in:
      3. Sibling sub-expressions are evaluated in decreasing order of operator precedence (if they are themselves operators), or last (if they are terms). Sibling of the same precedence group (and only those!) are evaluated left-to-right.

      Which still does not answer the question of "Why on earth would it be defined like that?", though. I really don't think it contributes to DWIM. I think it would be more intuitive if the third rule were simply:

      1. Sibling sub-expressions are evaluated from left to right.

      Maybe it's a performance optimization though, because it allows the compiler to only fetch the values of term once it actually needs to evaluate their parent expression?

        Well, quote-like operators have higher precedence than +=, so your example can still be explained by the "sub-expressions with higher-precedence operators are evaluated first" rule.

        What about +0 then ?
        my $c = 2; my @a = ($c+0, $c+=1); print "@a"; __END__ 2 3
        and also here result is 3 3:
        my $c = 2; my @a = ($c+=1, "$c"); print "@a"; __END__ 3 3

        below two examples proof aliasing:
        my $c = 2; my @a = ($c+=2, $c+=1); print "@a"; __END__ 5 5
        my $c = 2; my @a = ($c+=1, $c+=2); print "@a"; __END__ 5 5

        also, what about explicit parens:
        my $c = 2; my @a = (($c), ($c+=1)); print "@a"; __END__ 3 3
        from perlop (about comma):
        In list context, it's just the list argument separator, and inserts both its arguments into the list. These arguments are also evaluated from left to right.
Re: Why does the first $c evaluate to the incremented value in [$c, $c += $_] ?
by linuxer (Curate) on Mar 04, 2014 at 21:26 UTC

    Missed precedence. Eily's explanation reads much better than mine.

    $c points to a memory address with a stored value. Your assignment changes that value. So at the end, you have an array reference containing the same address (with the same incremented value). And you are printed the same value.