Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Testing by Contract

by Ovid (Cardinal)
on Jun 30, 2003 at 18:28 UTC ( #270259=perlquestion: print w/ replies, xml ) Need Help??
Ovid has asked for the wisdom of the Perl Monks concerning the following question:

Test contracts -- shoring up some testing problems

This is a long node dealing with various thoughts about building robust software and an idea I am working on to get around some annoying problems with testing. Specifically, unit tests don't check that different components work together, but integration tests often miss where an actual problem lies. What follows is some thoughts I've had about how to get around this problem.

Part 1: Argument handling

In Perl, all subroutines are inherently variadic. That is to say, they take a variable number of arguments:

sub foo { my ($foo,$bar,@baz) = @_;

The nice thing about that is that it makes argument handling easy. The not nice thing about that is that it makes argument handling too simplistic. Anyone who's longed for the ability to override methods based on signatures knows what I'm talking about. For example, in Java, you can do this:

public void set_foo(int bar) { ... } public void set_foo(int bar, int length) { ... }

The system knows to call the correct method based upon the number and types of arguments that I supply. If I want to do that in Perl, I frequently have to set up complicated conditionals in my subs to properly dispatch based upon my arguments (or use another module like Class::MultiMethod). The name, return type and argument types of a method are referred to as its signature. However, it's fair to ask what those method signatures are. In reality, when I declare the the first argument to foo() is an integer, I am really doing nothing more than adding a very simplistic test. To a large extent, statically typed languages are all about sprinkling simple tests throughout our code, but the types that are declared are a bit arbitrary. What if, in reality, the domain of the first argument to foo() is not all integers, but all even integers greater than zero? Having it declared as an integer is barely acceptable.

One way around this is to exhaustively test every argument to every subroutine or method.

sub foo { my ($bar,$baz,$quux) = @_; croak "bad bar" unless $bar > 0 && ! $bar % 2; # now test $baz # now test $quux # do our stuff }

Um, sure. We all do this, right? No, we don't. There are a variety of problems with this. The first is obvious: we can get so much debugging code into every subroutine or method that we start to obscure the intent of the code. Second, programmers often think that since this subroutine is buried deep within the system and never gets called directly by the user, all he or she has to do is ensure that it never gets passed bad data. This is a common strategy and actually isn't all that bad if you have a good test suite.

Another problem is that while we know that what we received was good, it tells us nothing about the state of what we return or whether or not the state of the class in which we're operating has been left in good shape (i.e., a class or package variable has not been changed to an inconsistent value).

That's where Design by Contract (DBC) gets involved. With contracts, (implemented in Perl with Class::Contract), we can carefully specify the domain of what we accept, the domain of what we emit and invariants which must respect a certain domain when they're done (like the class and package variables mentioned above). However, Class::Contract is specifically tied to the concept of classes, objects and methods. If you're writing a functional module, it doesn't seem like a proper conceptual fit. Further, it's not exactly intuitive for most to use. We want this stuff to be easy.

Part 2: Argument handling and testing

I've been thinking about a problem that sometimes crops up in testing. There are many different types of testing, but I'm thinking specifically about unit and integration testing. Let's say that I have five components, A, B, C, D, and E. I unit test the heck out of those components and they pass.

Now I do integration testing. Let's say that I test A and it calls B, which calls C, etc., all the way down to E. In the unit testing, I merely mocked up the interface to B, so A doesn't actually call it. In the integration tests, A actually calls B and even if all of our unit tests pass, the integration tests sometimes fail because of weird API problems. We can think of the call chain like this:

+-----+ +-----+ +-----+ +-----+ +-----+ | | AB | | BC | | CD | | DE | | | A |----->| B |----->| C |----->| D |----->| E | | | | | | | | | | | +-----+ +-----+ +-----+ +-----+ +-----+

In other words, the unit tests ignore AB, BC, CD, and DE. However, the integration tests also tend to ignore those. To properly test that chain and every step in it, we might consider testing DE, then CDE, then BCDE, etc. In reality, what I see happening in most test suites is that A gets tested with integration testing and the unit tests are skipped, or done very poorly. Then, when A gets a bad result, we're not always sure where it happened. Or worse, we see that E dies and we don't always know where the bad data came from.

Personally, I think this reflects a very real-world problem. I need to get my product out the door and the client is willing to accept a certain minimum level of bugs if this keeps the costs down. It's not possible to build a test suite that tests every possible combination of what can go wrong, so people write a bunch of unit tests and skip integration tests, or they write the integration tests and skip the unit tests, or they do a little of both (or just skip the tests).

Let's say in our testing that C produces a fatal error when arguments meet certain conditions. Why didn't the programmer write code to trap it in C? Because we realize that C is never called directly by the end user, but instead is fed a carefully massaged set of data which ensures that C can only receive safe data. Well, that's the theory, anyway. The reality is that C still sometimes gets bad data and we don't throw validation into every single function because we'll have so much validation code that our lumbering beast of a system is a bear to maintain. We don't know if C was passed bad data by B, or if C perhaps called D which called E which generated the bad data that gets returned. We have to stop and debug all of that to figure out where the problem lies.

Part 3: Test::Contract -- DBC for tests

Imagine a "design by contract" with testing. This combines a couple of ideas. First, I took many of the ideas from the Parameter Object thread. I'm also thinking about some of the work from Class::Contract, but making it more general to fit regular subroutines and not just methods. Some psuedo-code for the concept is like this:

sub assign_contract { my ($function_name, %contract) = @_; no strict 'refs'; my $original_function = \&$function_name; *{$function_name} = sub { my @arguments = @_; # run the first contract tests my @results; if (wantarray) { @results = $original_function->(@_); } else { my $results = $original_function->(@_); @results = $results; } # run post-condition tests on @results; } return @results; }

The idea is that the programmer sets up a "Contract" for each of A, B, C, D, and E and runs the tests with the contracts in place. These contracts are tests and passing or failing is noted in the test suite, but we don't have to write extra tests. If I test A, the contract tests for B, C, D, and E automatically get run. If F calls B and follows the same call chain, then I write tests for F and tests for B, C, D, and E still automatically get run without the programmer needing to write any extra tests for this!. In other words, I wind up with tests that specifically trace the flow of data through the system. Tests no longer ignore AB, BC, CD, DE, etc.

This has the benefit that we can focus our tests on integration testing and still not lose the benefits of unit testing. If C fails and we have properly defined contracts, we simply read our test output to find which of our contract tests have failed and we have a pretty good idea of what caused the failure in C without potentially tedious debugging. Further, while this is a significant performance hit, we don't have to worry about this in the actual production system.

Once I started working on the idea, I saw some significant implementation issues, but I think they can be worked around. I can't just add the contracts to a test script because if two test scripts use the same object, I don't want to duplicate the contract. That means putting the contracts in their own file.

use Test::More tests => 32; use Test::Contract 'contract_file';

The contract file might be a Perl script that points to a directory holding contracts for all namespaces. The problem I see with that is obvious: if I load Foo::Bar, how do I wrap the methods in contracts? I could try tying symbol tables, but tie happens at runtime and symbol tables entries are often loaded at compile time. I see many problems there.

Another possible approach is to see if I can override use and require. I've never tried it, though, and I suspect it's not possible.

Finally, I could potentially have every package specify where it's contract is loaded:

package Foo::Bar; use Contract::File 'Foo::Bar::Contract'; # I don't quite like this

With that, we could have the packages responsible for their own contracts and the contract file would (perhaps) check to see if $ENV{TEST_CONTRACT} is set. If it it, it sets up the test contracts. If it's not, it simply returns with minimal overhead.

Are there other strategies for implementing this that I might be missing? Are there any holes in this idea?

Cheers,
Ovid

Looking for work. Here's my resume. Will work for food (plus salary).
New address of my CGI Course.

Comment on Testing by Contract
Select or Download Code
Re: Testing by Contract
by diotalevi (Canon) on Jun 30, 2003 at 19:05 UTC

    This is so obvious that I must have missed something. (caller 1)[3] has the caller's package and you could specify a protocol where the contract exists in __PACKAGE__ . "::Contract". So from your test code you'd say use Contract::File; and in Contract::File your import() would fetch the caller's package.

    So now that I've publicly misunderstood your question, what did you mean again?

      I think what you're saying is that I have the Contract::File use or require everything that should be tested and then it can handle attaching the contracts because it knows that the methods are in the symbol table? That seems pretty clear and I should have thought of that sooner. I was so focused on my test contract idea that I managed to make this look harder than I thought :)

      That's seems like a good strategy. Any thoughts on the validity of the actual concept of test contracts?

      Cheers,
      Ovid

      Looking for work. Here's my resume. Will work for food (plus salary).
      New address of my CGI Course.

        I'd like there to be a cleaner conceptual break from 'testing' and contracts. I could understand the contracts being incorporated into testing but I don't mentally file them that way and it'd be a shame to have a general purpose module that acts on functions with unrelated bits stuck in.

        The overall idea, I like. Contract::File::Test might have your testing-specific code in it. My assumption is that in using this sort of thing that the module user would be using things like Params::Validate and throwing exceptions somehow. Maybe Exception::Class as well. Or how to signal contract failure?

Re: Testing by Contract
by adrianh (Chancellor) on Jun 30, 2003 at 20:21 UTC
    To a large extent, statically typed languages are all about sprinkling simple tests throughout our code

    I'm going to have to take you to task for that oversimplification later :-)

    Separating a contract from its class is an interesting idea, and one that's been on my "think about" pile for a couple of years now. For example, it allows you to retrofit contracts onto an existing codebase - something that's non-trivial with Class::Contract.

    Some food for thought:

    • For any wrapping of functionality around subroutines Hook::Lexwrap is your friend since you can easily scope the change.
    • For another perspective, consider the contract as an aspect (in the AOP sense) and apply it to classes with the Aspect module.
    • A big problem for DBC in "normal" perl is that there are so many ways to break encapsulation - so you can violate the class invarients in code external to the package. See Fun with Hook::LexWrap and code instrumentation for one possible idea on how to approach this.
    • If you've not done it already, go and ready Meyer's Object-oriented Software Construction which goes into the whole DBC deal in depth.

    I need to go finish my comments on the evil that is SWEBOK before midnight - maybe some more constructive comments later :-)

Re: Testing by Contract
by mvc (Scribe) on Jun 30, 2003 at 21:15 UTC
    1. ...Class::Contract is specifically tied to the concept of classes...

      This is also mentioned in the Class::Contract POD under "Offering better facilities for retrofitting contracts".

      Yep. This a cross-cutting concern that should not be tangled with a class definition framework. Such concerns are best untangled using the Aspect module. Of course it may be difficult to provide all the DBC features Class::Contract provides, like old() and inheritance support.

    2. See "A simple and practical approach to unit testing: The JML and JUnit way" for some work on going from assertions to tests, and a bibliography.
Re: Testing by Contract
by chunlou (Curate) on Jun 30, 2003 at 22:10 UTC
    A naive probabilistic approach to the problem of testing all possible combinations:

    Instead of testing all possible combinations, we could try to estimate the total number of bugs by a probabilistic approach. Given a list of all possible combinations, we randomly select two subsets of combinations for Tester One and Two to test.

    Let T be the total unknown number of possible bugs associated with all combinations.
    Let A be the number of bugs found by Tester One.
    Let B be the number of bugs found by Tester Two.
    Let C be the number of bugs found by both Tester One and Two.
    
    Hence (let P(X) be probability of X)

    P(A and B) = P(C)     (by definition)
    P(A)P(B) = P(C) (independence assumption) A B C --- * --- = --- T T T A*B ----- = T C
    That means, the less bugs both Tester One and Two found at the same time, the more likely there're still a large number of unknown bugs yet to be found.

    Or, the more common bugs found by both Tester One and Two, the more likely that they have found most of the bugs.


    _________
    If someone wanna see the Venn diagram:
       +----------------------------------+
       |                                  |
       |    +------------+                |
       |    |            |        T       |
       |    |    A       |                |
       |    |            |                |
       |    |     +------|-------+        |
       |    |     |  C   |       |        |
       |    +-----|------+       |        |
       |          |          B   |        |
       |          |              |        |
       |          +--------------+        |
       |                                  |
       +----------------------------------+
    

      Stats ain't my forte, but wouldn't this only be true if tests A & B were both capable of detecting all the possible unknown bugs?

      If this the case, then all you need to make this work is a sure fire way of designing tests that are guarenteed to be capable of detecting all possible bugs. Actually, you would probably need a way of designing two independant tests capable of detecting all possible bugs, as I doubt it would work if the two tests were the same:)


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        Since A and B are only random subsets of all possible combinations (in the case of integration testing), they are not going to detect all possible unknown bugs

        The key lies on C, the common area. If you look at the Venn diagram and if you imagine squeezing the superset T smaller and smaller, it will be less likely for C to be small--A and B must tend to overlap.

        The whole point is to estimate the total number of bugs without having to go through an exhaustive testing.

        Of course, that simple estimate probably won't be statistically very valid, since bugs are not independent. But it still gives a good conceptual insight--if a bunch of independent testers tend not to find common bugs, there're probably still pretty of bugs out there.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://270259]
Approved by defyance
Front-paged by Tomte
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2014-08-22 19:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (163 votes), past polls