RFC: Tutorial on Testingby tmoertel (Chaplain)
|on Sep 14, 2004 at 04:32 UTC||Need Help??|
I recently wrote an automatic, specification-based testing system for Perl called LectroTest. Because specification-based testing takes a while to grok, I'm trying to write some helpful introductory documentation. One bit o' docs I just finished is a tutorial-style introduction to testing that works through Test::More and ends up at LectroTest.
But I have a problem. I wrote the tutorial and, worse, I wrote LectroTest. I know how the darn thing works. If my tutorial has holes in it (or gaping, thorn-filled chasms for that matter), I won't notice because I'll subconsciously fill in the blanks.
That's where I need your help. Would you mind taking a critical read of my tutorial and giving me some feedback? It might even be fun. (If you're into that kind of thing – or if you just want to mock my writing.)
To all those who dare to venture within, many thanks!
In this tutorial we'll take a look at testing our Perl code. First, we'll quickly review why it's a good idea to test. Then, we'll create a small program to test and test it with Test::More. Next, we'll test our program more extensively with the relatively new LectroTest system. Finally, after a quick review, we'll grab an espresso and return to coding with renewed vigor.
Let's say we are writing a program. Being hackers of good character, we naturally want our program to be correct. In other words, we want to be confident that our program will behave as we expect it to behave. One way of gaining that confidence is through testing.
Testing is a practice in which we compare our program's behavior to its expected behavior on a case-by-case basis. We, the humans who know what our program is expected to do, create the test cases. Each test case says, "For this particular set of conditions, the expected behavior of our program is that." We then run our program under the given set of conditions, observe its behavior, and compare what we observed to the expected behavior for the test case. If the observed and expected behaviors match, the program is said to have passed the test. Otherwise, it fails, and we ought to examine our code for errors.
It's easy to test in Perl with Test::More
It's easy to create test cases in Perl. To see how the process works, let's first write some code to test. Below is an implementation of angdiff, a subroutine that computes the difference between two input angles, a and b, given in degrees. The expected result is the smallest positive angle between the two inputs. For convenience, we'll put angdiff in its own module called AngularDifference:
Take a look at the code for angdiff. It takes the absolute value of the difference between $a and $b and then "clamps" the result modulo 360 because two angles can never be more than a full circle -- 360 degrees -- apart.
It seems straightforward enough, but how confident are we that there's not a subtle error lurking in the code? Let's create some test cases to raise our confidence.
The Test::More module was created to make this kind of testing easy:
Here we have created a suite of six test cases. The suite, itself, is nothing more than a small Perl program that uses Test::More.
First, we imported the code we wanted to test, which is in the AngularDifference package. Then we loaded Test::More and told it that we have six test cases.
Next comes the test cases. Each takes the form of an is statement that compares the observed output of angdiff with the expected, correct result for the given set of conditions. The first test case is commented to show each part clearly. Each case also has a name. Always provide good names for your test cases because it makes your life easier in the long run, especially when your test suites become large.
Note that our small suite includes cases designed to flush out common programming errors that might be lurking around zeroes and argument ordering. We also have a few domain-specific cases that deal with properties of angles and circles. The idea is to exercise our code and make it sweat.
To run the tests, just run the program:
Looks like we have a problem! Our implementation of angdiff failed the 5th test case. We asked it to compute the difference between 0 and 270 degrees, and it returned 270. However, the correct result is the smallest angle between 0 and 270, which is 90 degrees.
It looks like our intuition to use modulo-360 truncation was wrong. Now that we think about it, we can't ever have a difference in angles greater than 180 degrees because, if we did, we could always find a shorter difference by going the other way around the circle from a to b. So let's bring the truncation threshold down to 180 instead:
And now, let's re-run our tests:
Ah, that looks better. Our revised implementation passes all six test cases.
Automatic, specification-based testing with LectroTestWith a small investment of six test cases, we were able to find a problem in our implementation of angdiff. What if we considered more test cases? What if we considered all possible test cases? If we could show that the actual and expected behaviors of our implementation were identical for all cases, we would actually prove that our implementation was correct. We would earn the ultimate kind of confidence!
Unfortunately, using testing to prove a program's correctness is impractical for almost all real-world programs. The typical program's behavior represents a surprisingly vast space to test, and there is no reasonable way to test all of the possible cases within it.
But we can test samples from that vast space to gain some degree of confidence in our program's correctness. Many common programming mistakes pollute a large portion of the "test space" with detectable deviations from expected behavior. The more samples we take, the more likely we are to find these deviations and the more reason we have to be confident in our program when we don't find them.
Right now, we take six samples from the test space of angdiff's behavior. That's not many and probably shouldn't give us a strong sense of confidence in our implementation. If we want more confidence, we will need more cases.
How many is enough? That's hard to say. Maybe we could analyze our code, figure out exactly how it works, and create just the right amount of just the right kinds of cases to give the code a good workout. But that would be tricky, especially if our code were more complicated. And what if we made a mistake and missed some important cases?
Let's consider another approach. Instead of creating just enough of just the right kind of cases, what if we created hundreds of mediocre cases? If we took that approach, we would have so many cases that each wouldn't need to be "just right." But where would we get all of those tests?
It sure would be nice if we could "delegate" this job to our computers. Hmmm...
Let's think about that idea. Consider our second case from earlier:
We can interpret it as saying, "For $a=90 and $b=90, we assert that angdiff($a,$b) must be 0." Wouldn't it be great if we could generalize that claim? We would like to be able to say, "For all $a and $b, we assert that angdiff($a,$b) must be X."
But there's a rub. It's that pesky placeholder for the correct result, X. If we were given some random angles $a and $b to check as a test case, we could certainly determine the actual result of angdiff($a,$b) – just call the function. But how could we determine the expected, correct result? That's tricky. If we could do that, we wouldn't need to test angdiff in the first place: Our code for determining the correct result would be the correct implementation of angdiff!
But not all is lost. What if we could change the way we look at the "test space" so that we could construct any random test case and the correct result for that case at the same time?
If we picked any angles $a and $b, we would have the problem of determining the correct difference between them – back to square one. But, if we instead picked $a and the correct, expected difference first, we could then determine a corresponding $b by working backward. Then we would know $a, $b, and the correct difference. That's everything we need for a test case!
Let's formalize this plan as a recipe:
This is where LectroTest comes in. It is an automatic, specification-based testing system. It is designed to take property specifications (like the one we just created) and check them by running large numbers of random trials against the software we're testing. Each trial is an attempt to "break" one of our property assertions at a particular point in the test space. Because LectroTest is automated, it can quickly and painlessly check thousands of test cases for us, giving us a much higher degree of confidence than is practical with manual test cases.
To see how LectroTest works, let's convert our recipe into a real, live LectroTest property specification that we can check. Like before with Test::More, we create a simple Perl program to hold our properties. This time, however, we use Test::LectroTest and declare a property instead of individual test cases:
The first part of the property specification is a generator binding. It tells LectroTest to automatically set up variables for you to use in a behavior test that comes later:
It reads, "For all integers a and for all integers diff in the range -180 to 180." Sound familiar? The only twist is that we are representing angles as integers for convenience.
The second part of the property specification is a behavior test. It uses the variables we bound earlier to test whether angdiff has the expected behavior at a particular instance of a and diff in the test space:
Note that the behavior test is just a block of code that has a true or false result. True means that angdiff had the expected behavior and thus passed the test for this particular case of a and diff. False means that it failed the test.
Finally, like before, we provide a meaningful name. Because our property is general and derived from the mathematical definition of angular difference, we name it "angdiff holds to defn of angular difference".
Now let's run our property check. Again, we just run the program:
Oops! LectroTest was able to falsify our property claim. That means it was able to find a point in the test space where our claim didn't hold for our revised angdiff implementation. It also emitted a counterexample, which shows exactly where that point is.
We can plug the counterexample into our code to debug the problem. After we fix the problem, we can add the counterexample to a list of test cases for regression testing to make sure that future modifications to angdiff don't reintroduce the same erroneous behavior.
By examining the counterexample, we see that angdiff "broke" when $diff was set at 180 degrees. Looking back at our new angdiff code, we can see the problem: Our modulo-180 truncation wraps sharply when the difference increases to 180 degrees, when it shouldn't. Let's compute a small table by hand that shows the correct differences for various values of $b when $a is fixed at 0 degrees:
See how the expected result climbs up toward 180 and then starts down again? See how angdiff wraps around sharply? That's the problem. With this knowledge, we can fix the bug:
Let's repeat our property check by re-running the program:
Ah, now that is more like it!
Still, having been burned once before by overconfidence in our testing, we should be cautious. Yes, LectroTest was able to find a problem that our manual test cases didn't, but do we have reason to believe there aren't more errors in hiding?
Maybe we should try to quantify the kinds of test cases that LectroTest is creating for us behind the scenes. In our property specification, we can make use of the magic object $tcon, which LectroTest provides to let us interact with the test controller. One of the things we can ask the test controller to do is attach labels to individual trials. At the end of our property check, LectroTest will tabulate the trials based on the labels we have attached and provide us with summary statistics.
One thing we might want to examine is how far apart the input angles $a and $b are. (Remember, we're letting $b = $a + $diff, so $diff tells us how far apart $a and $b are.) Here's one way to categorize them:
Re-running our check reveals the statistics:
Looking at the statistics tells us that we aren't checking any cases were the input angles are farther than 180 degrees apart. This seems like a hole in our testing strategy because in the real world, angles can be farther apart than that. The problem is that we constrain our $diff to a 180-degree magnitude, so that's as far apart our input angles will ever be.
To introduce a greater spread, we can add random multiples of 360 degrees to our calculation for the second input angle $b. Such multiples won't affect our assertion that the expected result is abs($diff), which is what makes our testing strategy work. Here's the rewritten property:
Now, let's run our check yet again and examine the frequencies:
Now it seems that we have the opposite problem. We're testing the really large differences most of the time, but not the smaller ones. Nevertheless, we are testing the lower ranges, if just a little, because even though they have a 0% frequency, the fact that they show up in the list at all means they were tested at least once.
Still, let's do the right thing and try to re-balance the distribution evenly. What unbalanced it was the introduction of $n, which is multiplied by 360. Any time $n is greater than one, we'll be in the "> 360" case. Why not, then, make $n pick a small integer half of the time and a large integer the other half? That way, we'll get an even distribution among our categories.
Believe it or not, this kind of thing is easy to do in LectroTest. We can use a special generator combinator OneOf to combine simple generators into a more complex one that does what we want:
As you might expect from its name, OneOf chooses one of the generators we've given it at random, and uses that generator to generate its final result. So we're choosing between a small-range generator and an unconstrained generator. Putting it all together, we get the following, revised property:
Running the property check:
That's better, but we're still placing too much emphasis on the large differences. Well, we can take care of that, too. Let's replace the OneOf combinator with Frequency, which lets us hand-tune the frequencies with which its sub-generators are chosen. We'll give the small-range generator a 20-to-1 advantage:
With this change in place, we get more-agreeable coverage:
With this result, we have good reason to be confident that our implementation of angdiff is correct. We created a property that specified the expected behavior of our implementation for all possible inputs. We tested the implementation against our specification by running thousands of test cases that were distributed randomly throughout the overall test space. Further, we quantified the distribution of cases that we were testing to ensure that there were no holes.
Mission accomplished! (Now would be a good time to enjoy a celebratory espresso.)
Let's reviewTesting is an effective technique for improving our confidence in the software we write. Lucky for us, Perl makes it easy to create good test suites.
Test::More gives us a simple, low-overhead way to test how our software behaves in specific cases that we define by hand. In many cases, this is all we will need.
When we do need more, one option is use LectroTest's specification-based tests. LectroTest lets us to specify behaviors that ought to hold across large test spaces, and then it randomly samples those spaces to see whether the properties actually hold. It can't prove that our properties hold, but for many kinds of properties it can quickly and easily give us good reason to be confident in them.
To ensure good coverage when we use a tool like LectroTest, it's a good idea to understand how the tool is sampling the test spaces we give it. Labeling is an easy way to quantify the sampling distribution. If it turns out that we need to adjust the distribution, generator combinators like OneOf and Frequency provide the tuning knobs we need.
Sources for more informationIf you're interested in checking out any of these tools – and you ought to because we only scratched the surface here – they're easy to find. Test::More is probably included in your Perl installation already, out of the box. LectroTest is available as Test::LectroTest on CPAN, and you can find more information about it at the LectroTest Home. Both tools provide much more depth than we covered here.
20040914 Edit by castaway: Changed title from 'Care you lend me your widsom? And your eyes, too?'