Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

When test-driven development just won't do

by Ovid (Cardinal)
on Aug 04, 2005 at 01:58 UTC ( #480680=perlmeditation: print w/ replies, xml ) Need Help??

Today at a testing talk at OSCON, dominus raised an important point. How do you write a test when you don't know what you're going to be testing? This is actually more common than you would think. I've discovered this happening in two different cases. The first case is when I've got a neat idea but I don't know how to implement it. When that occurs, I often just play around with code and see what happens. Sometimes I'm playing around with the code for a long time, running it by hand, seeing what it does, etc.

The second case, though less common, is when I'm porting code from another language. Usually that code lacks tests. Further, sometimes I'm either unfamiliar with the language or, more commonly, the code is very dense (or I am) and I'm not sure what it's doing. However, by porting all of the code I can compare end results and see that they're the same.

So how do I do test-driven development here? wickline mentioned that he'll then comment out the code and then start writing tests, adding back in code as he has the tests. I'd like to expand on that for a bit.

When I comment out the code and write the tests, there are three cases that can occur for each test.

  1. The test fails the way I expect it to.
  2. The test fails in an unexpected manner.
  3. The test passes.

The first case is the desireable one. Theoretically, I can then add back in the code I've written and cases 2 or 3 subsequently apply.

However, the code will sometimes fail in an unexpected way. If I'm writing a test that a particular line read from a file matches a regex and the test tells me that I'm trying to read from an unopened filehandle, it means I have a poor understanding of my code and I need to find out what's going on.

Sometimes, though, the test will pass. Once I had a test pass and it was because I had forgotten about some previous input validation I had. As a result, this suggested that I was on the verge of writing code I didn't need.

A common case where tests pass is with the can_ok $CLASS, $method; test. This gets tricky. Did I already write a method with that name? Did some naughty module import that method into my namespace? Did I inherit the method and am about to override it? Once again, I find myself in the position of having a poor understanding of my already written code and backing out the code and then writing a test protects me once again.

Cheers,
Ovid

New address of my CGI Course.

Comment on When test-driven development just won't do
Re: When test-driven development just won't do
by jimX11 (Friar) on Aug 04, 2005 at 05:07 UTC

    I agree, sometimes testing can be left out. I wonder how many monks write tests first.

    In both cases above, it seems to me, that no development is going on.

    The first case seems like brainstorming with code.

    The second seems like refactoring.

Re: When test-driven development just won't do
by xdg (Monsignor) on Aug 04, 2005 at 12:17 UTC

    Minor suggestion on the can_ok, based on some code I was recently writing:

    no strict 'refs'; ok( defined *{$CLASS."::".$method}{CODE}, "$CLASS has $method")

    At issue for testing is always whether a test really tests what you want. In this case, for example, can_ok $CLASS, $method isn't what you want if you want to know whether a method exists in a particular class. I find that test-driven development works for me because it makes me specify more clearly what behavior I want before I go write the code.

    For the first general case you raise, my view is that one writes the test as soon as one knows what the expected behavior is. Until that's defined, playing around is OK. For the second case, while I've not had to tackle that kind of project, if I did, I'd approach it top-down, not bottom up. E.g., write a "total program" test that takes certain input and gives the actual existing output, then start to work on macro testing of major subsystems, etc.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      xdg wrote:

      no strict 'refs'; ok( defined *{$CLASS."::".$method}{CODE}, "$CLASS has $method")

      That fails badly for a number of reasons. If you're just testing a function, it's OK. If you're really testing whether a class provides a given method, it fails. For example, if you have an inherited method, your test fails even though the class "has" the method. Further, the method might no be inherited but also not yet installed in the symbol table, yet can() might be overridden to supply the method (I've had to do this for "load on demand" services.) If you have OO code, use Perl's OO facilities to work with it. Just peeking into the namespace isn't enough.

      Cheers,
      Ovid

      New address of my CGI Course.

        You're absolutely right -- I wasn't intending to suggest it as a replacement for can_ok. I should have been clear that it's an example for when you want to see if code exists and is not inherited from a superclass (which is what I had needed to check at the time). Doesn't help identify when other modules have exported a function in, or for stuff handled with AUTOLOAD, or installed later, etc. All goes back to my point of needing clarity around what the actual question in before one knows how to write the test.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: When test-driven development just won't do
by Anonymous Monk on Aug 04, 2005 at 12:18 UTC
    There are a lot of things I would like to test, but I've no idea how to write tests for. For instance, how do you write a test that tests your objects (or other datastructures) timely destruct? That is, they get garbage collected at the moment you think they are, and aren't kept alive (consuming more memory you think they do) longer than they are supposed to due to an unexpected reference loop? Sure, you could add a DESTROY function that sets a global flag, but that would be a heavy price to pay.

    How do you test your shuffling or die rolling technique is fair and produces results according to your specifications (say, according to a Gaussing distribution), and isn't biased favouring certain outcomes?

    A common case where tests pass is with the can_ok $CLASS, $method; test.
    That's a test I wouldn't use without a testing $method itself as well - because it's a test which isn't interesting if it passes. It's interesting if it fails, because that you know the method isn't there, but if it's there, and doesn't do what it's supposed to do, its existance isn't useful. So, if I use this test, it would preceede the tests that test $method.

      For instance, how do you write a test that tests your objects ... timely destruct?

      How about (untested):

      { package CrashDummy; @CrashDummy::ISA = 'MyPrecious'; *ok = \&Test::More::ok; my $counter = 0; sub DESTROY { Test::More::ok( ++$counter == 2, 'DESTROY called' ); $_[ 0 ]->SUPER::DESTROY; } { my $dummy = bless +{}, __PACKAGE__; ok( ++$counter == 1, 'inner scope' ); } ok( ++$counter == 3, 'expected execution order' ); }
      ?

      How do you test your shuffling or die rolling technique is fair and produces results according to your specifications (say, according to a Gaussing distribution), and isn't biased favouring certain outcomes?

      There are a bazillion statistical tests to compare sampled distributions. I'm surprised none of these would meet your needs.

      the lowliest monk

      For random number generation:

      For garbage collection, if you really want to do this, why not install your own UNIVERSAL::DESTROY subroutine during testing:

      use strict; use warnings; package Foo; sub new { return bless {} } package main; use Test::More tests => 3; my %destroyed; $destroyed{Foo} = 0; { no strict 'refs'; *{"UNIVERSAL::DESTROY"} = sub { $destroyed{ref(shift)}++ }; } my $obj1 = Foo->new; my $obj2 = Foo->new; is( $destroyed{Foo}, 0, "Nothing destroyed yet" ); $obj1 = undef; is( $destroyed{Foo}, 1, "Destroyed 1" ); $obj2 = undef; is( $destroyed{Foo}, 2, "Destroyed 2" ); __END__ 1..3 ok 1 - Nothing destroyed yet ok 2 - Destroyed 1 ok 3 - Destroyed 2

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      I used this for one of my tests to make sure that the garbage is collected:
      use Test::More; # These tests try to make sure that objects are destroyed when they # fall out of scope; this requires avoiding circular strong references use strict; use warnings; #plan 'no_plan'; plan tests => 8; use Chemistry::File::Dumper; my $dead_atoms = 0; my $dead_bonds = 0; my $dead_mols = 0; { my $mol = Chemistry::Mol->read("t/mol.pl"); isa_ok( $mol, 'Chemistry::Mol' ); is( scalar $mol->atoms, 8, 'atoms before'); # make sure cloned molecules are also gc'ed my $mol2 = $mol->clone; # atom deletion garbage collection test $mol->atoms(2)->delete; is( $dead_atoms, 1, "delete one atom - atoms" ); is( $dead_bonds, 4, "delete one atom - bonds" ); is( $dead_mols, 0, "delete one atom - mols" ); } is( $dead_atoms, 16, "out of scope - atoms" ); is( $dead_bonds, 14, "out of scope - bonds" ); is( $dead_mols, 2, "out of scope - mols" ); sub Chemistry::Mol::DESTROY { $dead_mols++ } sub Chemistry::Atom::DESTROY { $dead_atoms++ } sub Chemistry::Bond::DESTROY { $dead_bonds++ }

      I'm off on a tangent here, so please excuse the off-topicness of this quick example.

      How do you test your shuffling or die rolling technique is fair and produces results according to your specifications (say, according to a Gaussing distribution), and isn't biased favouring certain outcomes?

      By using Test::LectroTest! Given this trivial implementation of an n-sided die, which returns a number between 1 and the number of sides on your die. Default number of sides is 6.

      package Die; sub new { my ( $pck, $sides ) = @_; return bless( { sides => $sides || 6 }, $pck ); } sub roll { my ( $self ) = @_; return int( rand( $self->{ sides } ) ) + 1; } 1;

      We then express how we want our die to perform with a Test::LectroTest property that generates a thousand tests with dies with from 1 to 100 sides. We roll the die, and check that the result is legal for this kind of die (eg. within the limits). The second propery generates a thousand six sided dies and rolls each of them once. The results are stored in the the LectroTest controller object with the tcon->label()-call, that automatically spits out the distribution of the roll.

      #! /usr/bin/perl use Test::LectroTest; use Die; Property { ##[ x <- Int( range => [ 1, 100 ], sized => 0 ) #]## my $die = Die->new( $x ); my $roll = $die->roll(); ( 0 < $roll && $x >= $roll ); }, name => "My die returns something between 1 and the number of sides + of the die."; Property { ##[ x <- Unit( 6 ) #]## my $die = Die->new( $x ); my $roll = $die->roll(); $tcon->label( $roll ); ( 0 < $roll && $x >= $roll ); }, name => "With a six sided die, I get this distribution";

      When I run this, I get:

      1..2 ok 1 - 'My die returns something between 1 and the number of sides of +the die.' (1000 attempts) ok 2 - 'With a six sided die, I get this distribution' (1000 attempts) # 18% 2 # 17% 3 # 16% 1 # 16% 5 # 16% 6 # 15% 4

      What this means is that the number 2 showed up 18% of the time, 3 17% etc. The numbers don't add up to a 100%, but I presume this is because of rounding in the presentation. In any case, this looks like acceptable behavior for a six sided die to me. I'm sure it's possible to automatically analyze the distribution and build this into the test, but I don't have the time to find out how right now. And please remember that this example was hacked together in a hurry, so I'm sure it can be improved.

      I like Test::LectroTest :)

      Edit: Removed an erroneous line in the second property.

      pernod
      --
      Mischief. Mayhem. Soap.

      Retitled by davido from 'OT: Test::LectroTest and pseudo random distributions'.

        No offence to Test::LectroTest intended, but in what way is this a test? For the "1 to 6" behavior, unless you're trying to test the underlying functions rand and int the only thing that matters is the endpoints of rand, which are 0 and a number close to 1 -- neither which Test::LectroTest are guaranteed to hit (and are even unlikely to hit). (See my module Test::MockRandom for a way to actually test your endpoints.) For the distribution, visually seeing it doesn't confirm anything, putting you right back to having to use one of the Statistics:: modules.

        I think I just don't "get" Test::LectroTest. To me, it feels like a thin veneer of testing -- "well, I tried it a bunch of times and it seemed to work". If the point is to identify edge/corner cases, it would seem to me to be better to identify and test them directly. If one isn't sure where these cases are, Devel::Cover will reveal them. (Note -- coverage is not correctness, but coverage will point out branches/conditions not taken in the code, which are the edge cases.)

        On reflection, maybe the point of Test::LectroTest is to try to expose the edge cases in your dependencies outside your own conditionals -- sqrt and division by zero come to mind. But I'd call it "stress testing" in that case and suggest that it is different from the way the term "testing" is usually meant in the various perl test suites. It doesn't tell you that your code is correct, only that it hasn't been shown to be incorrect for some number of trials.

        Test::LectroTest's author's presentation has a fairly good example using Email::Address, but includes a rather length set of email address generators (p. 49) which opens the question of how you know whether the error is in the generators or in the code?

        If someone's used Test::LectroTest extensively, I'm very curious to know in what kinds of cases it's proved useful.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        I like Test::LectroTest :)

        Me too.

        I think the difference between writing manual tests and using a testcase generator is somewhat analogous to using the CGI html generator functions and print, compared to using a templating module. It's another level of abstraction that ensures the details are accurate taken care of (given a good module), allowing greater productivity.

        It also avoids the tendancy to test the wrong things that is so prevalent.

        A relevant analogy is building cars. If the guys building engines stopped to physically verify that every nut they used was exactly the right size across the flats; made from steel containing the correct % of carbon; hardened to the approriate Stirling Rockwell number; and has threads who's profile was exactly as specified--car engines would cost the earth.

        Instead, the nuts are manufactured to a specified tolorance and sampled randomly for compliance against that specification.

        Way too many TDD programmers spend time testing the libraries they use (including Perl itself), with tests that they duplicate in every module/program they write, and that are also duplicated by many other programmers. These tests are (should be) done by the authors of those modules. If there is any doubt that the module/library/program is well-tested, then it can be verifed (once) by the organisation before certifying it for use, but there after, programmers within that organisation should rely upon it to comply with it's specification. Not re-test the same things over and over in every program that uses it.

        That's the essence of DBC.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
OT: How do you test without knowing the results?
by Anonymous Monk on Aug 04, 2005 at 15:40 UTC
    A related question: what do you do when the test results are themselves unknown? For sufficiently complex problems, if we could do the calculation by hand, we wouldn't need a computer.

    For example, I once worked doing runway analysis for cargo jets: it took a group of our programmers two hours to hand-check a single number on a single page. There were perhaps a 100 numbers like that on each page, and hundreds of pages for a single manual, for a single type of airplane, for a single client. Granted, it could be hand checked faster than two hours: the two hour figure was largely because the programmers were unfamiliar with all the charts and algorithms they had to wade through, but the problem is by no means simple; there is a lot of room for error at multiple stages, and the method remains prohibitively slow.

    We were provided with a binary program from a formally trained performance engineer who we had contracted to do these calculations correctly, but we had to take those numbers, and format them and present them on paper to the pilots who flew the planes.

    Technically, from a legal standpoint, proving the correctness of the numbers we provided wasn't our job. On the over hand, both our company's reputation and the physical saftey of the pilot and crew were on the line. That essentially made it our problem in practice. In theory, our documents were just a "guideline" to the pilot: but in practice, the pilot needed those numbers to comply with both legal and safety requirements. If he knew what those numbers should be, he wouldn't need our product at all.

    I eventually quit that job because I couldn't reconcile a way to prove the correctness of the numbers we were generating. These numbers were important to the flight of the plane (how much weight to put on it under given flight conditions, at what speed should the pilot lift off the runway, at what speed should he abort takeoff, at what height the pilot should stop climbing, and level off the plane to avoid hitting obstacles in case of an engine failure). While a good pilot could probably compensate for any gross errors, I decided that I didn't want to risk contributing to a flight disaster.

    Can anyone think of a better testing approach that I could have used? I used to sanity check the most strangest airport destinations I could think of (ones with mountains, or at strange elevations, or with really short runways), and look for errors so obvious that even I could find them, but lacking institutional support (eg. a testing department, code reviews, someone else testing my code, etc.), and formal training (in both testing methods and aircraft performance engineering, etc), I really couldn't find a way to do my job to the level I felt I needed to.

    As a result, I've gained an appreciation for testing, and formal correctness analysis. I'm always curious about how people improve their testing methodology, and especially how they convice management to let them do it. And today I work on things like billing systems, where nothing falls out of the sky if a bill goes out wrong.
    --
    Ytrew Q. Uiop

      For saftey critical systems, producing such data should be done by at least two completely independant programs, generated in clean room conditions by two completely different teams working from a theoretically proved, or engineering(ly) determined specification.

      The testing is done by comparing the output of the two systems and investigating any anomolies.

      This is a similar technique to that used by fly-by-wire systems on commercial aircraft. Three separate computers, often with different cpus to detect things like the Pentium floating-point bug, run different software written by different teams to the same spec. The independant computers are supplied the same information and perform the same computations, and another independant computer verifies their results against each other. If one of the computers produces different results from the other two, then the control computer will disregard that system's output and go with the other two. If one of them starts to produce consistantly different results, then it probably gets shut down.

      What happens if all three produce substantially different results? Panic I guess.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
        For saftey critical systems, producing such data should be done by at least two completely independant programs, generated in clean room conditions by two completely different teams working from a theoretically proved, or engineering(ly) determined specification.

        The testing is done by comparing the output of the two systems and investigating any anomolies.

        Yup. Parallel testing is good, but it wasn't something I could do by myself. It requires the expertise of a second team of aircraft performance engineers to interpret the charts and tables correctly. Those guys don't work for cheap, and my company wasn't willing to pay for that sort of thing. Testing is much easier when you've got resources to spend on it.

        In general, the real problem with my company was that it wasn't willing to pay to do things right; when I arrived, I was told we badly needed a testing department, and expected to create one. When I left 2 1/2 years later, I was still testing my own code, there was only one Q/A manager, with two guys under him. A few months later I learned they'ld fired the Q/A manager... I wasn't sorry I left.

        I guess my original problem wasn't really solvable: "how do you do a good job of testing, with no support from management?" I think the answer is: "You don't."
        --
        Ytrew Q. Uiop

      Les Hatton has some interesting papers you might like...
      My work in computer science has primarily been in the field of software failure. Much of it has involved the design and execution of experiments to attempt to determine the cause and hence to reduce the occurrence of failure in software systems.
Re: When test-driven development just won't do
by brian_d_foy (Abbot) on Aug 04, 2005 at 16:12 UTC

    People get got up in the silver bullets of programming. They want the one thing that is going to solve every problem, so they keep looking for the right agile methodology, design pattern, language, or whatever.

    The Master knows that all thing have their place. That test-driven development doesn't work everywhere shouldn't be any more surprising than that a screwdriver doesn't work any where. However, programmers are often surprised at, start flame wars over, and discuss ad nasuem which One True Method is the right one.

    When you can't write the tests first, you don't. When you don't know the right answer ahead of time, use your best guesss until you do. This sort of thinking is easier when you're not casting about for the magic spell that takes care of everything. :)

    --
    brian d foy <brian@stonehenge.com>
Re: When test-driven development just won't do
by dws (Chancellor) on Aug 05, 2005 at 05:04 UTC

    The first case is when I've got a neat idea but I don't know how to implement it.

    Assuming the neat idea is one that produces tangible results, write a functional test that invokes the "neat idea" code and makes some assertions about the results. Functional tests are good when you're making a cold start on an idea. They help you get started working from the outside in.

    I sometimes start with a (functional) WWW::Mechanize test that invokes a CGI that doesn't exist yet, and makes some assertions about what the result looks like. Then I work inwards towards things that I can unit test. Often though, having a set of functional tests is sufficient.

    Sometime if I find myself coding too far ahead, I'll stop and comment out the code I've just written. Then I'll write tests that give me a legitimate excuse to uncomment it bit by bit. It's not rare to discover that I've written stuff I don't really need. And occasionly, as you mention, a test will pass when it isn't expected to. Those end up being great opportunities for learning.

Re: When test-driven development just won't do
by Dominus (Parson) on Aug 08, 2005 at 06:10 UTC

    The thing I was thinking of specifically was the program linogram that I presented in Chapter 9 of my book. When I first started the project, I had only the dimmest notion of what I wanted the program to do. I knew that the user would describe a diagram with a text file, and include some equations relating parts of the diagram to each other, and that the program would solve the equations and then draw the diagram. So there was a long period during which I really wasn't sure what the program would do.

    Then there was another, longer period during which I thought I knew what I wanted it to do, but I had no idea how the program was going to do it. Where does it get the equations from? Aren't there implicit equations? What is the input format? And during this phase there was constant interplay between what I thought I could program and what I wanted the program to do. What information can the program infer, and what must the user give it explicitly? Will the input be ambiguous? Should the program try to disambiguate it? What kinds of figures wil the user want to draw? Boxes and lines and arrows, sure; what about other things? But to calculate the position of the arrowhead on an arrow, you need to solve trigonometric equations, which are really hard. So do you forbid arrows? Or do you find some way to restrict the trigonometry to be possible? Or do you go whole hog and put a complete computer algebra system into the program? Or do you use a clever trick? And if so, what is the clever trick?

    So through all this, and I think it took three or four months, I had no thought of writing tests, and I'm not sure what it would even mean to write a test.

    Then these was a phase during which I started implementing stuff, and for every function I wrote that ended up in the final product, I probably threw away four or five others that turned out to be impossible, or that exposed some confusion in my mind about how such a program should really work, or that ended up sending me back to stages 1 and 2 to redesign enough of the whole thing that the function no longer made sense. I suppose I could have written tests for some of this stuff, but it would have been a complete waste of time. I had bigger fish to fry.

    Writing tests is great when you know what the program is supposed to do. But it seems to me that if you know what the program is supposed to do, you already have the really hard part of the problem solved. Development and maintenance programmers forget this because their bosses and clients hand them descriptions of what the programs are supposed to do.

    After I had the basic system running, and the book was published, I undertook a major revision, which I hope to release this month. The first thing I did to start the revision was to write test apparatus. The test suite has grown rapidly and has been a big help. But I think trying to write tests before the first version was complete, when it was still a pulsating, inchoate mass of murky thought-stuff, would have been a waste of time.

    The thing that really struck me after the testing session was that the people who came up to talk to me did not seem to understand what I was talking about. They really didn't seem to get just how vague my idea was and for how long. They kept thinking of programs where some little tiny thing was unspecified, but I was talking about projects where you really have no idea at all what you want, or what is possible, or how those two things relate to each other.

Re: When test-driven development just won't do (Prolog)
by Anonymous Monk on Sep 09, 2005 at 17:10 UTC
    I wonder what would happen if people realized that tests are really nothing more than a (possibly incomplete) specification written in a declarative language. Would people approach the whole aspect of testing differently? Obviously, that makes the whole "I don't know how to test" question mute. Sometimes I think it would be interesting to translate all of the unit tests that some people write into Prolog, and see how close you could get to a working application.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://480680]
Approved by itub
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2014-09-23 08:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (212 votes), past polls