Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Quantum Weirdness and the Increment Operator

by barrachois (Pilgrim)
on Jun 24, 2004 at 05:34 UTC ( [id://369247]=perlmeditation: print w/replies, xml ) Need Help??

In quantum physics, watching something happen can change its behavior. In particular, a single particle going through two slits interferes with itself - unless you measure which slit it goes through, which destroys the interference.

Perl's ++ operator seems to behave like this, when the "watching" is done with a subroutine call.

Consider the output of

$m = 20; print ++$m + $m++;
and
$m = 20; print noop(++$m) + $m++; sub noop{ return shift }
What's going on here?

If you'd like to see how arcane your knowledge of Perl really is, try guessing what this prints without reading ahead or evaluating the code.

For those with way too much time on their hands, this document and the code samples it refers to can be found at at http://cs.marlboro.edu/talks/increment_weirdness/.


Disclaimer

Before going into any detail, I'd like to make it clear that this is an academic exercise only.

Steve Oualline has a nice description of the use of increment operators and their side effects in his book "Practical C++ Programming", pg 79. Essentially he says that if you want to understand how these tricky increment expressions work, the right answer should really be

"If you don't write code like this, then you don't have to worry about these sorts of questions."

The problem

Even though clearly none of *us* would write code like this (ahem), a friend (Mark Francillon) gave me some expressions like (++$m + $m++) as a puzzlers, and I was curious enough to look into it.

To really understand the pre and post increment ops, I wrote my own preInc and postInc subroutines, doing what I *thought* these operators were supposed to do.

# This is an attempt at emulating ++$m with preInc($m) sub preInc { $_[0] = $_[0] + 1; # Increment input argument (the side effec +t), return $_[0]; # and return the new, incremented value. } # And this is an attempt at emulating $m++ with postInc($m) sub postInc { my $temp = shift; # Remember original value, $_[0] = $_[0] + 1; # increment input argument (the side effec +t), return $temp; # and return the old, un-incremented value +. } my $m = 20; print preInc($m) + $postInc($m); # This prints 42. # The final value is $m is 22.
This all makes perfect sense to me. What's going on here is
  1. The preInc($m) increments $m to 21, and returns 21.
  2. The postInc($m) returns 21 ($m's current value), then
  3. increments $m a second time to its final value of 22, leaving
  4. the value of the sum as 21+21=42.
Both C and Java give this same value 42 for similar expressions, by the way; see the files run, Inc.c, Inc.java, Inc.pl, and their outputs in run_output.txt

If this was the whole story then I wouldn't be writing all this down. However, the value returned by the Perl interpreter is *not* 42, but 43. If you don't believe me, try it for yourself.

my $m=20; print ++$m + $m++; # This prints 43 ! # The final value is $m is still 22.
And then I started pulling out my hair.
look()-ing at intermediate results

My first attempt at understanding what was going on was to write a subroutine that would examine the intermediate results. You can find all the gory minutia in increment_detail.pl.

# Print values and addresses of passed argument and $m. sub look { print "look was passed '" . $_[0] . "' at . \$_[0] . ".\n"; print "while \$m is '" . $m . "' at " . \$m . ".\n"; return $_[0]; } my $m = 20; my $p = look(++$m) + look($m++); print $p;
But that's where the quantum weirdness popped up.

After a variety of attempts it became clear that any subroutine call wrapped around (++$m) changes the result of the calculation to 42.

sub noop { # do nothing return shift; } my $m=20; print noop(++$m) + $m++; # This prints 42 !
So if I tried to watch it do this weird thing, it wouldn't do it.

By now this felt like a conspiracy.


overload-ing to look without touching

Another friend (Brandt Kurowski) suggested using operator overloading to watch the intermediate steps, without disturbing the calculation. This works, and has helped me understand what's going on, but hasn't quite answered all my questions.

See IncrementOverload.pm for that analysis.

==== increment weirdness: ++$m + $m++ ========== m = 20 at 0x80ab23c p = ++m + m++ *** inc 0x80ab23c : 20 --> 21 m is 21 at 0x80ab23c *** copy 0x80ab23c --> 0x804c120 m is 21 at 0x80ab23c *** inc 0x804c120 : 21 --> 22 m is 22 at 0x804c120 *** add : 22 at 0x804c120 + 21 at 0x80ab23c = 43 at 0x80ab11c m is 22 at 0x804c120 p = 43 at 0x80ab11c
The discussion on pg 357 and thereabouts of the Camel describes some of the inner workings of the increment operators; in particular, if there's more than one pointer to something then it makes a copy first and increments the copy. Thus to overload ++, you must also overload the copy operator. In the listing above, the "inc", "copy" and "add" lines are printed out by overloaded subroutines, all invoked while evaluating ++$m + $m++.

Here's a blow by blow account.

The calculation starts out as I'd expect, with ++$m incrementing $m (..23c) in place.

Since the left term of the sum is going to be needed later, I'm guessing that another name (pointer), say left_sum, is also now given to that (..23c) value.

The second increment, $m++, now sees something with several names, and so makes a copy (..120) before proceeding. The original address, namely (..23c) which now contains 21, is given another name, something like right_sum, to be used later when the terms are added together.

The copy is then incremented to 22 (..120), which is the final value of $m.

In the last addition step, the left_sum pointer has been apparently been carried along with the renaming of $m; that is, by the time the sum is evaluated its value is 22.

On the other hand, when ++$m is wrapped in a subroutine call, left_sum isn't "carried along" in this way; left_sum remains 21 after $m gets to 22.

How exactly the addition operation keeps track of what it's going to be adding isn't entirely clear to me; the devil seems to be in the details of what the thing I'm calling left_sum ends up referring to.

Staring at all this long enough gives the gist of *how* Perl gets 43 : ++$m evaluates to $m itself, which is incremented *again* by $m++. So by the time the addition operation actually needs a value for the term on the left, its value is 22.

There are more comments at the end of Increment.pm, along with printouts for some other (perhaps illuminating) variations.


Why?

After all this analysis on *how* this works, I'm still left with one big question.

Why is 43 the right answer?
Clearly Perl's answer must be the right one, and so I'm sure there's some really good reason why the operators *should* behave this way in this context - I just can't quite get my head around what that reason might be. :)

Anyway, it was fun trying to look a bit under the hood.

  - barrachois

Replies are listed 'Best First'.
Re: Quantum Weirdness and the Increment Operator
by BrowserUk (Patriarch) on Jun 24, 2004 at 06:08 UTC

    Fun.

    The reason that using your preInc() sub changes the result is because the value it returns is a copy of $m's value at the end of the call to the sub.

    By contrast, the value of $m used by the + operator in the non-sub statement is the value of $m at the point after both the subexpressions involved in the + operation have been evaluated.

    I assume that the order of the evaluation is a result of converting the expressions to their reverse polish form. Clear as mud, but this is (roughly) equivalent

    $m=20; $r = \$m; print ${ $$r += 1; $r } + ${ my $t=$$r; $$r += 1; \$t }; 43

    At least, it produces the same result and is close enough for my mental processes to get a notional handle on the mechanism.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Quantum Weirdness and the Increment Operator
by hv (Prior) on Jun 24, 2004 at 13:32 UTC

    I think the reason it works this way is simpler than you make out. The important thing is the implementation of the preincrement operator: do { $m = 20; ++$m } does not return the value 20, rather it returns the variable $m.

    Therefore the sequence of operations reduces to:

    $m = 20; # $m == 20 ++$m; # $m == 21 $temp1 = $m++ # $m == 22, $temp1 == 21 $temp2 = $m + $temp1 # 22 + 21 = 43 print $temp2

    Hope this helps,

    Hugo

Re: Quantum Weirdness and the Increment Operator
by Abigail-II (Bishop) on Jun 24, 2004 at 08:53 UTC
    Consider the output of
    $m = 20; print ++$m + $m++;
    and
    $m = 20; print noop(++$m) + $m++; sub noop{ return shift }
    What's going on here?

    If you'd like to see how arcane your knowledge of Perl really is, try guessing what this prints without reading ahead or evaluating the code.

    This is trivially explained, and I'm amazed there are still people giving this any serious consideration. Let me spell it out once more, in easy to read letters:

    Modifying the same variable using auto-increment/decrement more than once in the same statement gives undefined behaviour.

    The first sentence in perlop.pod regarding auto-increment and auto-decrement reads:

    "++" and "--" work as in C.
    And there you have it. Works as in C. And in C, using it twice on the same variable in the same statement gives undefined behaviour.

    Abigail

      I'm sorry that I'm the one that has to tell you this, but people start programming the year 2004 and I believe we can expect more people to start programming in the future. I bet someone somewhere just recently wrote his first hello world program. So I'm not surprised that issues like this is brought up again. Most of us has different stages we go through as a (Perl) programmer. Giving this any serious consideration may be one of them. This isn't a trivial issue for many, and trying to understand how it currently works is something that should be encouraged IMHO as it'll give greater understanding for how Perl works, even though the behaviour isn't defined. All kinds of enlightenment should be encouraged and not ridiculed.

      And there you have it. Works as in C.

      Unfortunately, not everyone knows C.

      You have apparently spelled this out several times, so I recommend you to send a documentation patch for perlop that clarifies what "as in C" really means if this upsets you so much. Then you've done some real good. Frankly, I don't see what there is to get upset about. He is just a guy that wants to explore Perl.

      ihb

        Let me rephrase Abigail-II's answer, in easy-to-follow steps.
        1. The pre- and post-increment are documented somewhere.
        2. That documentation is in perlop.
        3. Perlop says "If you do X in Perl, it is the same as doing X in Foo".
        4. The documentation for Foo (which is as easily obtainable as perlop) is that doing X has (deliberately?) undefined behavior.

        The documentation is complete. The problem isn't a problem. That the documentation refers to other documentation may be annoying, but it is complete. And, whether you like it or not, every single person who worked/works on the Perl sourcecode and nearly everyone who worked/works on the Perl design is fluent in C and has no problem with statements like that. In other words, any single person who would have ever worked on the documentation is fluent in C and has no problem with statements like that.

        . . . so I recommend you to send a documentation patch for perlop that clarifies what "as in C" really means if this upsets you so much.

        Ah, but he's not the one who's upset by the documentation. You are. He (and I) is upset by your unwillingness to read said documentation.

        An analogy - if you were to pick up a textbook on calculus, it would assume you knew how to do algebra. Would this bother you? What if it said "The edge cases of this transformation are similar to this transformation as done in Set theory."? Because, frankly, that's what perlop is saying. "The edge cases of the pre- and post-increment operators are the same as in C."

        Let's look at something else - the reason why it's the same as in C. Because Perl is implemented in C! Do you really think the Perl developers reinvented the wheel here? No! They delegated to the source language, because that's the behavior they were looking for. And, if the ANSI C standard ever changes how this edge case behaves, then Perl will have that change as a matter of course. Hence, the reason to refer to the C documentation. It's very OO in its thinking. We inherit from C and delegate back to C.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

          A reply falls below the community's threshold of quality. You may see it by logging in.
        And so we are expected to tolerate everybody who starts programming in 2004 not referring to anything written before 2004? We are supposed to accept that manuals don't exist? We are supposed to accept that new suers will ask questions without even considering that they are not the first to come across some thing or another?

        Well, like it or not, there is a body of knowledge out there.

        When the manual says that the behaviour of something is udefined, then it also implies " and it may come back to bite you if you try to use it".

        Enlightenment is welcomed, but the answer to the question is clear, it is documented. It is written in plain English for all to see.

        jdtoronto

      A reply falls below the community's threshold of quality. You may see it by logging in.
      undefined but currently consistent.

      And understanding how it is working currently is valuable in forming an in-your-head model of what perl actually does with your code.

      (Notwithstanding that, constructs with undefined behavior should not be used in real code.)

      Good morning Abigail.

      Looking up the word "behaviour", I find that it relates to "actions" and "reactions". Things that have happened. Actions that have taken place. Past tense.

      Saying that behaviour is undefined, makes no sense. Until something has happened it isn't behaviour.

      If you predict that a certain thing will (future tense) display a certain behaviour, and you get it wrong. It didn't display the "wrong behaviour", or "no behaviour". The action the the thing displayed (past tense) was the behaviour. You were simply wrong in your prediction.

      Therefore, you can't say a "behaviour is undefined", because an action or reaction doesn't become behaviour until it has happened, and when it has happened, it can be measured, and is therefore not "undefined".

      It may not be predictable--though in this case, I (rashly) bet the outcome has been the same since Perl 1; maybe you would confirm that?--which kinda makes it a bit predictable.

      It may also be unexplainable. Which may explain why the explainations are so lacking.

      It is, however, OBSERVABLE! Huge red font omitted.

      If a phenomena can be observed, it can be questioned. The strongest defining impulse of the human being over our non-sentient co-inhabitants, is the ability and need to ask the question WHY?.

      You may well be correct in reaching your conclusion that asking why is not "worth your effort" or "likely to result in a you reaching a satisfactory conclusion", but that is your conclusion. It satisfies your criteria.

      Is it your assertion that because you have reached a conclusion that satisfies you, that noone else in the entre world is allowed to think about this, or discuss it any place where you might observe that discussion?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
        "Results are undefined" in Computer Science means, "the documentation is not going to explain what you'll get in Tuesday's version of the compiler on John's favorite machine."

        In the case of keys %foo, the results are defined: you will get all keys. The order they're returned is defined but poorly so: it's a mistake for documentation to say they are returned in "random order," because the order has nothing to do with a PRNG. It would be better to say the order is not defined, as that is more accurate. Tuesday's compiler on John's machine will produce something inconsistent with your own experiments.

        In the case of ++$m + $m++, the results are undefined: no documentation claims to give an authoritative answer on such expressions, and some documentaion will specifically disavow any predictable results.

        For those who do not program in C, here's the gist of the action. The do { my $x = $m; $m++; $x } mechanism is not a sufficient postincrement operator, because real post-incrementation happens after the rest of the expression. However, the specific moment that is defined to be after the rest of the expression is not clearly defined; various versions of a C compiler will differ, and various hardware platforms may also have something to say about it. It's undefined.

        --
        [ e d @ h a l l e y . c c ]

        By 'undefined', I think the traditional comp-sci meaning of "you may get anything or nothing" was meant. There's no telling what will happen, and whatever does happen may change over time as new compilers and libraries are implemented.
      My optician says you will hear from my lawyer if the dammage is permanant.;) a.

      That is not entirely the case here though. See the following code and output.

      my $f = 1; print '$f = ', $f, ' and $f + $f++ = ', $f + $f++, "\n"; my $g = 1; print '$g = ', $g; print ' and $g + $g++ = ', $g + $g++, "\n"; __END__ $f = 2 and $f + $f++ = 3 $g = 1 and $g + $g++ = 3

      This is not the expected behaviour in C and i'm only auto-increment/decrement a single time.


      ___________
      Eric Hodges
        What is undefined is the order of operation. Since ($f + $g++) == ($g++ + $f) this is normally a reasonable assumption, but of course ($f++ + $f) != ($f + $f++).
        -- gam3
        A picture is worth a thousand words, but takes 200K.
Re: Quantum Weirdness and the Increment Operator
by Zaxo (Archbishop) on Jun 24, 2004 at 20:53 UTC

    You can follow the results by realizing that post-increment returns a value, while pre-increment returns an alias to the variable. That makes subsequent operations of high enough precedence turn up in pieces that have already been evaluated.

    There is no reason that perl must do it that way. Perl could do away with the C restriction, which is there for good C reasons, and return values for each increment operation. That would make perl do what you expected. But for forseeable perl, as has been said (and shouted), "Don't do that!"

    After Compline,
    Zaxo

Re: Quantum Weirdness and the Increment Operator
by ysth (Canon) on Jun 24, 2004 at 18:33 UTC
    This may prove useful to study:
    my $x; # unless overloaded, numeric values of two refs to the same thing are +equal {package X; use Carp; sub is_x { carp \$x == \$_[0] ? "ok" : "not ok" +}} X::is_x($x); X::is_x(++$x); X::is_x($x++); X::is_x($x+0);
      Yes.

      Part of what surprised me was that "return" gave me back a copy, so my "noop" subroutine should actually be called "copy".

      sub noop { X::is_x( $_[0] ); # ok return $_[0]; } X::is_x( noop($x) ); # not_ok
      Update

      The plot thickens.

      I see over on Aliasing and return, how does return work? a discussion of just this issue, with a suggestion by japhy to use lvalue subs.

      A quick test shows that one can indeed return $x itself this way - but the increment expression is still altered by a call to this kind of subroutine, too.

      sub noop_lval :lvalue { $_[0] } X:is_x( noop_lval($x) ); # ok ! $m=20; print noop_lval(++$m) + $m++; # 42 (still) $m=20; (++$m) + $m++; # 43
      So I guess the copy that "return" makes isn't the whole issue.
Re: Quantum Weirdness and the Increment Operator
by andyf (Pilgrim) on Jun 28, 2004 at 19:34 UTC
    Fun indeed! Interesting philosophy. Undefined but consistent is oxymoronic, if its consistent (not just constrained) then its defined. The subtlety is that we _really_ mean not defined across platforms and implementations. The platform, processor architecture, C compiler used and a host of other things add variables (that may as well be random) into the equation that defines the statements behaviour.

    I applaud the determination and curiosity of the OP, its a fascinating tale, and only misguided in retrospect of knowing that Perl itself does not know the answer. If nothing else a great amount is learned about Perls treatment of variables. On another note I see no reason for Perl to maintain this. A language is what you DEFINE it to be. Perhaps Perl 6 will be bold enough to say 'this is the order of evaluation, this is the behaviour, anything else is wrong' and have done with it. IANA compiler writer btw, dragon books scare me, maybe I'm missing a really good reason, but in my mind its just a question of definition and sticking to it. On another note, Barrachois approach is typical empirical programming, good stuff, tweak it and see, try and break it and find out how it works. If you never encountered a problem like that before you wouldn't know what 'Behaviour is undefined' even means in the documentation.

      Fun indeed! Interesting philosophy. Undefined but consistent is oxymoronic, if its consistent (not just constrained) then its defined. The subtlety is that we _really_ mean not defined across platforms and implementations. The platform, processor architecture, C compiler used and a host of other things add variables (that may as well be random) into the equation that defines the statements behaviour.

      While this is a semantic argument, it is not as straightforward as you might think. The term 'undefined' in comp-sci lingo comes from what electrical engineers designing digital circuits call 'undefined', the so called "don't cares" in a truth table. It's not that these "don't cares" will not take on a value if the circuit happens to somehow land in that state -- it's just that the designer doesn't care, and will not gurantee at all the value of that state.

      So when you're reading about a language and it says "undefined", it is with a capital 'U' and can be read as "not guaranteed", as in "this is the part of the contract of the API".

      The designers of said language would probably slap you upside the head if you said "but it's consistent, therefore it's defined", mostly because in their lingo the indeterminate variables that you yourself mention are all taken seriously when incorporated under the umbrella of a strict definition such as "undefined". And you are correct that, on the hardware level, the influences can be random. These are generally avoided via the use of guard circuits -- see here for a discussion regarding guard clauses.

      Cheers,
      Matt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://369247]
Approved by BrowserUk
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-03-19 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found