Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Does anybody write tests first?

by BrowserUk (Pope)
on Feb 23, 2008 at 07:19 UTC ( #669724=note: print w/replies, xml ) Need Help??


in reply to Re^2: Does anybody write tests first?
in thread Does anybody write tests first?

When I test just that *.t file (e.g. with "prove"), I should see any existing tests pass and that one test I just wrote fail.

That's fine when you're writing a given piece of code. You know what you just added or changed and it's probably sitting in your editor at just the right line. With the minuscule granularity of some of the test scripts I've seen, a failing test probably translates to an error in a single line or even sub clause thereof.

But most of my interactions with Test::* (as someone who doesn't use them), are as maintenance programmer on newly built or installed modules running make test.

I've not just forgotten how the code is structured, and how the (arbitrarily named) test files relate to the structure of the code. I never knew either.

All I see is:

... [gobs of useless crap omitted] ... t\accessors.......ok 28/37# Looks like you failed 1 test of 37. t\accessors.......dubious Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 14 Failed 1/37 tests, 97.30% okay ... [gobs more crap omitted] ...
  • Something failed.
  • 28 from 37 == 9. But it says: "# Looks like you failed 1 test of 37."
  • Then it usefully says: "dubious". No shit Sherlock.
  • Then "Test returned status 1 (wstat 256, 0x100)". Does that mean anything? To anyone?
  • Then (finally) something useful: "DIED. FAILED test 14".

Now all I gotta do is:

  1. work out which test--and they're frequently far from easy to count--is test 14.
  2. Where in accessors.t it is located.
  3. Then work out what API(s) that code is calling.
  4. And what parameters it is passing.

    Which if they are constants is fine, but if the test writer has been clever efficient and structured a set of similar tests to loop over some data structure, I've a problem.

    I can't easily drop into the debugger, or even add a few trace statements to display the parameters as the tests loop, cos that output would get thrown away.

And at this point, all I've done is find the test that failed. Not the code.

The API that is called may be directly responsible, but I still don't know for sure what file it is in?

And it can easily be that the called API is calling some other API(s) internally.

  • And they can be located elsewhere in the same file.
  • Or in another file used by that file and called directly.
  • Or in another file called through 1 or more levels of inheritance.

And maybe the test writer has added informational messages that identify specific tests. And maybe they haven't. If they have, they may be unique, hard coded constants. Or they could be runtime generated in the test file and so unsearchable.

And even if they are searchable, they are a piss poor substitute for a simple bloody line number. And they required additional discipline and effort on the behalf of someone who I've never met and does not work for my organisation.

Line numbers and traceback are free, self maintaining, always available, and unique.

If tests are located near the code they test, when the code fails to compile or fails an assertion, the information takes me directly to the point of failure. All the numbering of tests, and labelling of tests, is just a very poor substitute and costly extra effort to write and maintain--if either or both is actually done at all.

That said, the cost of writing a line of test for trivial code is pretty minimal, so I often think it's worth it since what starts out trivial (e.g. addition) sometimes blossoms over time into function calls, edge cases, etc. and having the test for the prior behavior is like a safety net in case the trivial becomes more complex.

This is a prime example of the former of the two greatest evils in software development today: What-if pessimism and Wouldn't-it-be-nice-if optimism.. Writing extra code (especially non-production code) now, "in case it might become useful later" costs in both up-front costs and ongoing maintenance.

And sods law (as well as my own personal experience), suggests that the code you think to write now "just in case" is never actually used. Though inevitably the piece that you didn't think to write, is needed.

Some code needs to cover all the bases, consider every possible contingency. If your code is zooming around a few million miles away at the other end of a low-speed data-link burned into eprom. Then, belt & braces--or even three belts, two sets of braces and a reinforced safety harness--may be legitimate. But very few of us, and a very small amount of the world's code base live in such environments.

For the most part, the simplest way to improve the ROI of software, is to write less of it! And target what you must write, in those areas where it does most good.

Speculative, defensive, non-production crutches to future possibilities will rarely if ever be exercised, and almost never produce a ROI. And code that doesn't contribute to the ROI is not just wasted capital investment, but an ongoing drain on maintenance budgets and maintenance team mind-space.

Far better to expend your time making it easy to locate and correct real bugs that crop up during systems integration and beta testing, than trying to predict future developments and failure modes. And the single best contribution a test writer can make to that goal is to get the maintenance programmer as close to the source of the failure, when it occurs, as quickly as possible.

Yes. It is possible to speculate about what erroneous parameters might be fed to an API at some point in the future. And it is possible to write a test to pass those values to the code and ensure that the embedded range checks will identify them. But it is also possible to speculate about the earth being destroyed by a meteorite. How are you going to test that? And what would you do if you successfully detected it?

And yes, those are rhetorical questions about an extreme scenario, and there is line somewhere between what is a reasonable speculative possibility and the extremely unlikely. But that line is far lower down the scale than most people think.

Finally, it is a proven and incontestable fact that the single, simplest, cheapest, most effective way to avoid bugs is to write less code. And tests are code. Testing is a science and tests should be designed, not hacked together as after-thoughts (or pre-thoughts).

Better to have 10 well designed and targetted tests, than 100 overlapping, redundant, what-ifs.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Does anybody write tests first?
by andreas1234567 (Vicar) on Feb 23, 2008 at 20:46 UTC
    a simple bloody line number.
    Test and line number are not mutually exclusive.
    $ cat 669457.t use strict; use warnings; use Test::More; BEGIN { plan tests => 1; } cmp_ok(1, q{==}, 2, q{Expect 1==2}); __END__ $ prove 669457.t 669457....NOK 1/1 # Failed test 'Expect 1==2' # at 669457.t line 5. # got: 1 # expected: 2 # Looks like you failed 1 test of 1. 669457....dubious + Test returned status 1 (wstat 256, 0x100) DIED. FAILED test 1 Failed 1/1 tests, 0.00% okay Failed Test Stat Wstat Total Fail List of Failed ---------------------------------------------------------------------- +--------- 669457.t 1 256 1 1 1 Failed 1/1 test scripts. 1/1 subtests failed. Files=1, Tests=1, 0 wallclock secs ( 0.03 cusr + 0.00 csys = 0.03 C +PU) Failed 1/1 test programs. 1/1 subtests failed. $
    --
    Andreas

      All of that to get a line number that perl gives you for free?

      And, you have had to re-run the failing test file using prove (which seems to be a different version to that I have?) instead of the make test.

      And how am I meant to get my editor to pick the line number out of all the rest of that redundant crap?

      16 lines of ouput for 1 that is useful. A simple die unless 1 == 2; achieves the same thing without all the redundancy. And if it's located in the code being tested, the line number will lead directly the the source file that needs fixing instead of a test file that I then have to manually relate to the appropriate source file.

      And what happens when the test calls a function, that calls a function in another module, that fails. Do you get traceback?

      It's just layer upon layer of complexity, taking you further and further away from the real code, for the sake of producing a set of meaningless statistics that nobody cares about.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^4: Does anybody write tests first?
by xdg (Monsignor) on Feb 25, 2008 at 19:12 UTC
    Finally, it is a proven and incontestable fact that the single, simplest, cheapest, most effective way to avoid bugs is to write less code. And tests are code.

    And program code is code. Therefore, if you write no code at all, you'll have no bugs. Of course, you'll also have no features.

    Testing is a science and tests should be designed, not hacked together as after-thoughts (or pre-thoughts).

    So is your objection to writing tests first as opposed to after the fact? Or to hacky, poorly-designed tests, regardless of whether they were written first or last?

    My hypothesis would be that tests are more likely to be designed well when they are viewed by the developer as an integral part of the development of program code rather than something to be added afterwards -- at least with respect to individual developers.

    If your development model has QA developers writing tests independently, then maybe the advantage is less.

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      And program code is code. Therefore, if you write no code at all, you'll have no bugs. Of course, you'll also have no features.

      Is that a facetious reply? Or did you genuinely think I was not aware of that obvious consequence? :)

      On a more serious note. A step of project design that was common years ago but that seems to be missing from too many shops these days is risk/benefit analysis. It is entirely possible, and surprisingly common, that once a project has been show to be possible, and the predicted development effort costed, the biggest ROI possible is to not do the project at all.

      The point should be clear. You write just the code required to implement the features you need. And do just as much as is required to test those features.

      Writing extra code or tests now, to hedge against future possibilities is wrong. There are three possible outcomes of that extra effort--no matter how little extra it is.

      1. You predicted the future exactly:

        No extra effort is required.

      2. The predicted future possibility never comes to pass:

        The extra effort is wasted.

      3. A different--slightly or wholly--new requirement or feature is needed:

        Not only is that early extra effort wasted.

        The extra effort expended up front, has to be backed out in order to accommodate the new code.

      So, simplistic math puts your chances of predicting the future correctly and so benefiting from the extra effort expended as 33%. If you believe that your powers of prescience can do substantially better, give up programming and start playing the stock market or visiting casinos. But, keep quiet about it because your local military sci-ops team are likely to come looking for you in the middle of the night if they get wind of it :)

      So is your objection to writing tests first as opposed to after the fact? Or to hacky, poorly-designed tests, regardless of whether they were written first or last? My hypothesis would be that tests are more likely to be designed well when they are viewed by the developer as an integral part of the development of program code rather than something to be added afterwards -- at least with respect to individual developers

      My objection (to typical Perl/CPAN test suites) is the prevalent methodology. It is really hard to make a cogent argument on this subject in the abstract.

      • A part of my objection is the effort (and duplication of effort), involved in using the Test::* (TAPI) toolset.
      • A part of my objection is to the sprawling, ad-hoc, undesigned nature of the test suites it produces.

      It can be typified by the test suite for DBM::Deep. Let me say here that I think dragonchild has done an amazing job with this module, and his test suite is extensive and thorough. What I am going to be critiquing here is the effort that has gone into its construction, and it's opacity for those coming along to use it after the fact.

      Design

      Certainly incomplete, but in essence, DMB::Deep allows you to create Perlish hashes and arrays on disk.

      • Just as with memory based hashes and arrays (hereafter called HARRAYs), they can be arbitrarily nested.
      • You can create HARRAYs.
      • You can extend HARRAYS.
      • You can iterate HARRAYs.
      • You can destroy HARRAYs.
      • You can add elements to HARRAYs.
      • You can modify elements in HARRAYs.
      • You can delete elements from HARRAYs.
      • In addition to the tied interface, there is an OO interface.
      • Arbitrary combinations of the above features can be wrapped in transaction brackets.

      Okay, so now let's think about a testing strategy to cover that lot. My initial thoughts are:

      1. If I create a HARRAY in ram, as well as the HARRAY on disk, and perform exactly the same manipulations to both, then at any given moment during those manipulations, my pass/fail criteria can be: Does the disk HARRAY match the ram HARRAY?
      2. And by adopting this strategy, I no longer need to hard wire each test so that I know what "output" to expect. That means I can choose my keys and values randomly.
      3. By using randomly generated values, I can pick my ranges and iteration counts:
        • So as to produce some statistically meaningful coverage numbers.
        • To test small and large sized structures.
        • To evaluate worst case performance with pathological datasets--like large numbers of keys that hash to a single bucket.
      4. And for my transaction tests, I can create equivalent ram-HARRAY and disk-HARRAY. Then modify the disk-HARRAY alone inside a transaction that I never close and the ram and disk HARRAYs should remain equivalent at all times.

      More would be required, but this is just a reply to a SOPW reply (to a SOPW reply...).

      For repeatability, I seed the PRNG with srand.

      For regression testing, I redirect the terminal output to a file and compare against an earlier capture using diff.

      This strategy allows me to add temporary debug trace without completely screwing up the rest of the testing.

      I can drop into the debugger, set a breakpoint, skip over the early tests and walk through the failing test.

      At any time I can enable/disable asserts to stop at the point of failure or just log and run on.

      At any time I can enable/disable full trace back or just top-level caller traceback.

      There have been several replies that say "you can do that to with Test::*/prove/TAPI". That's fine (though many of the can-do-that-to's seem to be very recent additions on the basis of my encounters), but I still question what those tools give me that is extra and useful?

      And does that make up for all the things--print, debugger, traceback, remoteness--that they take away? IMO, the only extra they give is a set of statistics that I have no interest in and can see no benefit from.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://669724]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2019-10-17 00:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?