http://www.perlmonks.org?node_id=1072990


in reply to 'state' variables and unit testing

Note that the equivalent to state vars is my $_cached = ..., not our $_cached = ..., so you are really comparing apples and oranges.

But the problem is really something else: caching a value that depends on another value without taking this dependency into account is a bug, and unit testing simply exposes this bug. If you do the caching via state variables or not is really secondary. (And if the caching does not depend on a different value, then I don't see the problem for unit testing).

Update: I should say that state holds global state (but hey, its name even suggests that it holds state :-), even though access to it isn't global. So it should be used with care.

Replies are listed 'Best First'.
Re^2: 'state' variables and unit testing (Devel::Init)
by tye (Sage) on Feb 01, 2014 at 22:36 UTC

    Delaying initialization until first use is a very common technique. And it is an important technique when it comes to making unit testing easier. Initializing too early is a great source of serious, hard-to-work-around problems when it comes to unit testing. (See Devel::Init for a lot of discussion about why and how to avoid problematic premature initialization.)

    The specific example given above is obviously contrived. So let me pick a different example:

    package MyConfig; ... our %Config = ( ... ); sub getConfig { my( $key ) = @_; our $Host ||= hostname(); our %HostConfig; return $HostConfig{$key} //= _getConfigForHost( $Host, $key ); }

    Clearly the value from getConfig($key) depends on which host it was called on. That is the design. It is not a bug. It would be a huge pain and introduce much room for mistakes all over the place if we required everybody who wanted to get a configuration variable to do their own lookup of what their hostname is and pass that in to the API that gets host-specific configuration information.

    We could try for both easy-to-use and "the getConfig() API is not buggy according to moritz" with something like:

    sub getConfig { my( $key, $host ) = @_; $host ||= hostname(); our %HostConfig; return $HostConfig{$key,$host} //= _getConfigForHost( $host, $key +); }

    But there are a lot of ways that that is worse. Getting the configuration for the wrong host is not something that we want people to be doing in Production code. It is very much something we want people to be doing in test code.

    So we use the first approach where getting configuration information for any host is easily possible but only by doing things that all of our programmers know constitute ugly peeking under the covers and so are not things that they would do in Production code, but are things that make sense to do in unit test code.

    The point is even more important when we are testing code that gets configuration information somewhere under its covers. I may want to be able to test that code from the perspective of some other host without having to plumb the "optionally pass in a hostname parameter" at every single layer between the entry point that I'm testing and any number of places that might be looking up configuration information.

    So I want my test code to be able to do:

    $MyConfig::Host = $test_host;

    And it is convenient to be able to have a single unit test script that can perform tests pretending to be on any number of other hosts with:

    for my $host ( ... ) { $MyConfig::Host = $host; %MyConfig::HostConfig = (); ... }

    This is an approach that does make it possible to do stupid things that get you strange results (for example, if you leave off the resetting of %MyConfig::HostConfig above). But the only way to get screwed up in Production code is by doing things that are obviously not appropriate in Production code (in our group, anyway). And the ways to get screwed up in test code are just not very likely.

    We could add a bunch of code to try catch such unlikely mistakes, but that code would be in the Production MyConfig package. And more code means more complexity which means more chances for bugs. I don't want code in Production modules that is there only to try to prevent unlikely mistakes that are only possible outside of Production.

    It is an official best practice in our group to use "our $Var //= ..." over "state" or even a more complex construct with "my". And that is because such persistent variables are very likely to be things that are convenient to be able to override when writing unit tests (for lots of different reasons). And there are lots of cases of things that make perfect sense to cache just one version of in Production because there is only one valid way to get it in Production while it also makes sense to use different and more than one value for those same things in unit tests.

    - tye        

Re^2: 'state' variables and unit testing
by vsespb (Chaplain) on Feb 01, 2014 at 20:44 UTC
    Note that the equivalent to state vars is my $_cached = ..., not our $_cached = ..., so you are really comparing apples and oranges.

    Yes 'state' is more like 'my', but I disagree about apples and oranges.

    I am comparing two different caching approaches:

    1) caching without 'state'. old way.

    2) caching with 'state'. new way.

    I am comparing (2) with (1) because both about caching and (2) is much easier to write/clearer than (1). Not because 'state' (not)similar to our/my.

    and my $_cached isn't much easier/clearer than our $_cached = ... (except "our" var is visible in global namespace, but I use underscore in name to indicate that it's private).

    But the problem is really something else: caching a value that depends on another value without taking this dependency into account is a bug

    Not always.

    In my example I've used $ARGV[0] to indicate something "global" (perhaps it's not clear in that particular example that @ARGV is not something that should be passed across call stack).

    Other use of global things include:

    %Config variables, $^O, $$, $ENV, locale, Versions of modules used, perl version, other application specific global flags and constants, which are immutable during process execution

    More than that, those global var, actually, not necessary affect the result

    For example I might have a code which calculates value on 64bit machine, and a slower code which calculates it on 32bit machine. Result should be same on both. And I can assume that 32bit version will work on 64bit as well and wish to test both in unit test running on 64bit machine

    $cached_value{ is_64bit_machine() } ||= somelongfunction()
    does not make sense in this case.

    Also, even if there is no implicit global parameter, I might want to run function twice in test with other functions mocked different ways.

    There are other uses of 'state' not related to caching

    Something like this untestable too

    sub global_sum { state $sum = 0; $sum += $_[0]; $sum; }
    EDIT Anyway, if you suggest that function result should depend only on it's arguments, than mocking/stubbing is not needed at all. However that's not true, because we see mocking libraries exist for Perl.
      Something like this untestable too

      To me it looks like a bug, not just untestable. Tests just reveal the bug.

      Let me rephrase my earlier point: APIs are testable or not testable; state variables are just a way to implement an API. state vars are a tool, and neither good nor evil; it's only misuse of state vars (and thus broken APIs) that makes stuff hard to test.

      If you use a global for the caching you are doing it wrong. You allow anyone, anywhere to change the cached value. The right way before state was

      { my $_cached; sub whatever { $_cached //= ... } }

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        It is very often useful to prevent people from accidentally violating encapsulation. It is usually a mistake to try to prevent people from intentionally violating encapsulation.

        Caching to a global variable is very often extremely handy when writing unit tests. Nobody accidentally modifies a global variables in another package. Nobody on my team even does that intentionally in Production code.

        And I've seen lots of really stupid things done to try to work around such attempts to prevent intentional violations of encapsulation. They are usually much worse than the behavior that was trying to be prevented.

        - tye