http://www.perlmonks.org?node_id=426024


in reply to Re: Test/ Code Ratio
in thread Test/ Code Ratio

Nicely summarised++.

And that I think, for all the words I have written (but not yet published) attempting to explain my distaste for the Test::* modules, this is the crux of that distaste.

Test::Harness, and many of the others, tend to emphasis quantity over quality.

They also put the emphasis on percentage passed, rather than what failed.

Those two factors tend to combine to encourage the writing of lots of little tests, and ignore the effect of duplicate tests--"Hey, you can never have enough testing!".

The result is that the one failing test is swamped in the high volume of (often duplicate) tests passed.

So the headline is a feel-good "99.98% passed" rather than the realistic and crucial "1 test failed".

Testing is a bit like condoms...99.98% safe isn't any comfort when the 0.02% happens.


Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^3: Test/ Code Ratio
by xdg (Monsignor) on Jan 28, 2005 at 20:37 UTC

    Two comments on this:

    1. 99.98% isn't feel good if your goal is 100%. When I'm evaluating test results, I don't care about percentage -- I want to see zero test failures. Feeling good at 99.98% is an attitude problem, not a Test::Harness problem.
    2. I've personally found that test-driven development works for me and that Test::Harness, et al., make that quick and easy. The power of writing tests first is in having to be absolutely clear what output I expect before I write my code. If that leads to tons of little tests, so be it. The point isn't that I've written lots of tests, it's that I've clearly specified the requirements of my module/application in a verifiable way.

      If the tests don't flag some broken behavior despite having tons of little tests, that's a failure on my part to write a good specification, not a failure of Test::Harness. E.g., if I don't specify what the application should do when input is faulty, then any behavior is acceptable because I haven't constrained it. Defensive coding ("open or die") is just a coders response to make the best of a poorly specified situation.

    Like most tools, Test::* modules are only constructive in the hands of a skilled user. To the OP's point, are lots of lines of test code relative to lines of application code a sign of redundancy or inelegance or a well-thought-out and comprehensive specification of behavior? The answer depends entirely on the specific application and code (and it might be a combination of those, as well).

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      I've personally found that test-driven development works for me...

      I am a strong advocate of test-driven development, and I have been for a long time, although I didn't know it would end up being called that for most of that time. I am *not* critisising test-driven development. I am expressing my personal doubts about the tools that I have seen available for supporting that.

      My point, badly made, is that I have no interest in seeing screenfuls of "xxxx.t .....ok". I do not care how many tests passed, or what their names were, or what the percentages are.

      The only thing I am interested in is "0 failures" or "Failure: test nn at file.pl:(nnn)". The output from test harness doesn't tell me what I need to know. What (code; not test) failed,how it failed, and where (which source code file, not test file).

      Instead I have to go off on a hunting spree, first locate the test that failed, then re-run it having added extra print statements to find out how it failed. Then track that back to the source code where the failure originates.

      The Test::* modules are set up to make the writing, running and reporting of tests easy.

      But writing tests is not the objective. The objective is the finding and fixing of failures to meet specification.

      To this end, I want a process that puts the tests close to the code under test. That way, when failures occur, the messages can take me directly to the code that needs fixing, not to a test script in another directory that leaves me with nothing except grep in order to track back the failing test to the failing code.

      All that said, I do not have an alternative to offer. I have played some with some ideas based upon Devel::StealthDebug.

      I am also very impressed by my reading of tmoertel's LectroTest.

      I have ideas for combining these two notions--automated test generation inline, with the ability to turn those tests off for production use such that they have zero impact upon the tested code when disabled.

      That is whereI think the future lies--inline, automated (unit) testing that can be enabled and disabled via command line switch.

      I think that P6 is moving in this direction with is PRE{}, POST{}, FIRST(), ENTER{} & LEAVE() blocks. I have yet to see enough (or visualise enough) P6 code, and the specifications are rather loose and changable, for me to decide whether they are flexible enough to achieve everything I would like, but they appear as if they might.

      So, as I think I have remembered to say each time I have mentioned it--my reservations are purely my own. As with everything, what exists now is infinitely better that what might exist one day.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.

        It sounds like what you're doing is more testing implementation. That's not what TDD as it has been pubilicized widely is. TDD is about testing interfaces rather than implementation (though the line is admittedly blurry at times).

        Personally, I can't follow you argument about having to look at the testcase to track down the failure; maybe if you only used a simple ok() function? The more expressive tests in Test::More like is() and friends will not only tell there was a failure but also what was expected and what happened; together with descriptive test names I almost never have to look into my testcases to track down a failure.

        My own experience with doing interface tests inline is that this makes aggressive refactors hard (which runs contrary to the spirit of test-first development). I find that particularly the tests I write first often suggest a completely different direction than what the eventual structure of the code grows into. That doesn't hinder me; they're separate from the code, after all.

        The Test:: modules were written with that approach in mind, and since you want something other than that, they obviously won't be a particularly good fit. That doesn't make them subpar in itself, it just means there's an impendance mismatch between your goals and theirs.

        Makeshifts last the longest.

Re^3: Test/ Code Ratio
by Aristotle (Chancellor) on Jan 28, 2005 at 22:10 UTC

    The one who emphasises quantity is the programmer reading the output. A craftsman shouldn't be blaming his tools. That said, I've always found the percentage readings useless prefer to run my testcases directly rather than under harness.

    chromatic and Ovid have recently been better men than you or I and got down to actually do something about this in the form of better test suite output. I particularly enjoy chromatic's supression of the output of passing tests. That would seem to be what you're after, as well.

    Makeshifts last the longest.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^3: Test/ Code Ratio
by petdance (Parson) on Jan 31, 2005 at 03:13 UTC
    So the headline is a feel-good "99.98% passed" rather than the realistic and crucial "1 test failed".

    I hadn't thought of it that way, but you're exactly right. The percentage is effectively meaningless.

    Rewriting how Test::Harness summarizes results is one of the things on my to-do for the reasonably-near future. When I do, I will probably leave out the percentages.

    xoxo,
    Andy

      Thankyou for taking my comments in the light in which they were intended.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.