Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

'state' variables and unit testing

by vsespb (Chaplain)
on Feb 01, 2014 at 17:43 UTC ( #1072981=perlmeditation: print w/replies, xml ) Need Help??

There is common pattern to cache result of heavy calculations.
use strict; use warnings; use v5.10; our $_cached; sub somefunc { $_cached //= do { say "Heavy calculations"; 42+($ARGV[0]||0); }; } say somefunc(); say somefunc(); __END__ Heavy calculations 42 42
since 5.10 there is "state" statement, making this pattern even more easy.
use strict; use warnings; use v5.10; sub somefunc { state $var = do { say "Heavy calculations"; 42+($ARGV[0]||0); }; } say somefunc(); say somefunc(); __END__ Heavy calculations 42 42

However I see one problem here: unit testing.

With first example it's easy to test somefunc with several possible inputs.

{ local $_cached = undef; local @ARGV = (3); say somefunc(); } { local $_cached = undef; local @ARGV = (5); say somefunc(); }

and with "state" it looks impossible? Once you call somefunc() from your test, it's result cached forever.

Looks like a missing important feature, which can encourage people not using unit testing? What do you think?

Replies are listed 'Best First'.
Re: 'state' variables and unit testing
by moritz (Cardinal) on Feb 01, 2014 at 19:59 UTC

    Note that the equivalent to state vars is my $_cached = ..., not our $_cached = ..., so you are really comparing apples and oranges.

    But the problem is really something else: caching a value that depends on another value without taking this dependency into account is a bug, and unit testing simply exposes this bug. If you do the caching via state variables or not is really secondary. (And if the caching does not depend on a different value, then I don't see the problem for unit testing).

    Update: I should say that state holds global state (but hey, its name even suggests that it holds state :-), even though access to it isn't global. So it should be used with care.

      Delaying initialization until first use is a very common technique. And it is an important technique when it comes to making unit testing easier. Initializing too early is a great source of serious, hard-to-work-around problems when it comes to unit testing. (See Devel::Init for a lot of discussion about why and how to avoid problematic premature initialization.)

      The specific example given above is obviously contrived. So let me pick a different example:

      package MyConfig; ... our %Config = ( ... ); sub getConfig { my( $key ) = @_; our $Host ||= hostname(); our %HostConfig; return $HostConfig{$key} //= _getConfigForHost( $Host, $key ); }

      Clearly the value from getConfig($key) depends on which host it was called on. That is the design. It is not a bug. It would be a huge pain and introduce much room for mistakes all over the place if we required everybody who wanted to get a configuration variable to do their own lookup of what their hostname is and pass that in to the API that gets host-specific configuration information.

      We could try for both easy-to-use and "the getConfig() API is not buggy according to moritz" with something like:

      sub getConfig { my( $key, $host ) = @_; $host ||= hostname(); our %HostConfig; return $HostConfig{$key,$host} //= _getConfigForHost( $host, $key +); }

      But there are a lot of ways that that is worse. Getting the configuration for the wrong host is not something that we want people to be doing in Production code. It is very much something we want people to be doing in test code.

      So we use the first approach where getting configuration information for any host is easily possible but only by doing things that all of our programmers know constitute ugly peeking under the covers and so are not things that they would do in Production code, but are things that make sense to do in unit test code.

      The point is even more important when we are testing code that gets configuration information somewhere under its covers. I may want to be able to test that code from the perspective of some other host without having to plumb the "optionally pass in a hostname parameter" at every single layer between the entry point that I'm testing and any number of places that might be looking up configuration information.

      So I want my test code to be able to do:

      $MyConfig::Host = $test_host;

      And it is convenient to be able to have a single unit test script that can perform tests pretending to be on any number of other hosts with:

      for my $host ( ... ) { $MyConfig::Host = $host; %MyConfig::HostConfig = (); ... }

      This is an approach that does make it possible to do stupid things that get you strange results (for example, if you leave off the resetting of %MyConfig::HostConfig above). But the only way to get screwed up in Production code is by doing things that are obviously not appropriate in Production code (in our group, anyway). And the ways to get screwed up in test code are just not very likely.

      We could add a bunch of code to try catch such unlikely mistakes, but that code would be in the Production MyConfig package. And more code means more complexity which means more chances for bugs. I don't want code in Production modules that is there only to try to prevent unlikely mistakes that are only possible outside of Production.

      It is an official best practice in our group to use "our $Var //= ..." over "state" or even a more complex construct with "my". And that is because such persistent variables are very likely to be things that are convenient to be able to override when writing unit tests (for lots of different reasons). And there are lots of cases of things that make perfect sense to cache just one version of in Production because there is only one valid way to get it in Production while it also makes sense to use different and more than one value for those same things in unit tests.

      - tye        

      Note that the equivalent to state vars is my $_cached = ..., not our $_cached = ..., so you are really comparing apples and oranges.

      Yes 'state' is more like 'my', but I disagree about apples and oranges.

      I am comparing two different caching approaches:

      1) caching without 'state'. old way.

      2) caching with 'state'. new way.

      I am comparing (2) with (1) because both about caching and (2) is much easier to write/clearer than (1). Not because 'state' (not)similar to our/my.

      and my $_cached isn't much easier/clearer than our $_cached = ... (except "our" var is visible in global namespace, but I use underscore in name to indicate that it's private).

      But the problem is really something else: caching a value that depends on another value without taking this dependency into account is a bug

      Not always.

      In my example I've used $ARGV[0] to indicate something "global" (perhaps it's not clear in that particular example that @ARGV is not something that should be passed across call stack).

      Other use of global things include:

      %Config variables, $^O, $$, $ENV, locale, Versions of modules used, perl version, other application specific global flags and constants, which are immutable during process execution

      More than that, those global var, actually, not necessary affect the result

      For example I might have a code which calculates value on 64bit machine, and a slower code which calculates it on 32bit machine. Result should be same on both. And I can assume that 32bit version will work on 64bit as well and wish to test both in unit test running on 64bit machine

      $cached_value{ is_64bit_machine() } ||= somelongfunction()
      does not make sense in this case.

      Also, even if there is no implicit global parameter, I might want to run function twice in test with other functions mocked different ways.

      There are other uses of 'state' not related to caching

      Something like this untestable too

      sub global_sum { state $sum = 0; $sum += $_[0]; $sum; }
      EDIT Anyway, if you suggest that function result should depend only on it's arguments, than mocking/stubbing is not needed at all. However that's not true, because we see mocking libraries exist for Perl.
        Something like this untestable too

        To me it looks like a bug, not just untestable. Tests just reveal the bug.

        Let me rephrase my earlier point: APIs are testable or not testable; state variables are just a way to implement an API. state vars are a tool, and neither good nor evil; it's only misuse of state vars (and thus broken APIs) that makes stuff hard to test.

        If you use a global for the caching you are doing it wrong. You allow anyone, anywhere to change the cached value. The right way before state was

        { my $_cached; sub whatever { $_cached //= ... } }

        Enoch was right!
        Enjoy the last years of Rome.

Re: 'state' variables and unit testing (our)
by tye (Sage) on Feb 01, 2014 at 17:50 UTC

    We just use:

    sub somefunc { our $Var //= do { say "Heavy calculations"; 42+($ARGV[0]||0); }; }

    Best of both.

    - tye        

Re: 'state' variables and unit testing
by Laurent_R (Canon) on Feb 02, 2014 at 23:52 UTC
    Hmm, state variables are in my view a useful addition to Perl, but not quite as useful as closure variables. They are also a bit counter-intuitive, as in the following example provided by the OP:
    sub global_sum { state $sum = 0; $sum += $_[0]; $sum; }
    If I don't know what state is doing, I might think that $sum is reset to 0 each time through the sub, and I get a totally wrong idea of what the sub is really doing. It is not really obvious that the first line in that sub is executed only the first time through the sub and not in the next calls. Also, consider this example:
    $ perl -e ' > use strict; > use warnings; > use v5.10; > > sub somefunc > { > state $var = do { > say "Heavy calculations"; > 42+$_[0]; > }; > } > > say somefunc(0); > say somefunc(10); > ' Heavy calculations 42 42
    Although it is quite easy to explain what is going on, I would submit that this can be unexpected. The following code, using a closure, is more along the line of the behavior you would probably want:
    $ perl -e ' > use strict; > use warnings; > use v5.10; > { my $var = 42; > sub somefunc > { > $var + shift; > } > } > say somefunc(5); > say somefunc(10); > ' 47 52

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1072981]
Front-paged by Corion
and a moth chases the moon...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2018-06-24 13:23 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (126 votes). Check out past polls.