|Problems? Is your data what you think it is?|
Re: Software Design Resourcesby chunlou (Curate)
|on Aug 22, 2003 at 07:03 UTC||Need Help??|
Seems like you're operating at an extremely low level of fault allowance more like building an airplane than a buddy website.
Instead of trying to write a book, guess I'll start off with some somewhat randomly selected thought.
In order to know the "quality" of your code, it's good to know how many bugs are there, which of course is unknown. Finding how many bugs in your code is like finding how many fish of certain species in a particular area of an ocean. Rarely could we find it by exhaustive count but need a probabilistic approach instead--say, a catch-and-release approach (language borrowed from fish-counting).
Example (rewording of something I posted elsewhere a while back). Your code accepts various combinations of input. Some bugs are to be found by entering those input (whereas some others by load-testing, etc.) Normally, all possible such combinations are too vast to be practical, timely and productive to test them all. Instead we randomly select, say, two subsets among all such possible combinations for Tester One and Two to test.
Let T be the total unknown number of possible bugs associated with all combinations. Let A be the number of bugs found by Tester One. Let B be the number of bugs found by Tester Two. Let C be the number of bugs found by both Tester One and Two.
Hence (let P(X) be probability of X)
P(A and B) = P(C) (by definition)
That means, the less bugs both Tester One and Two found at the same time, the more likely there're still a large number of unknown bugs yet to be found. Or, the more common bugs found by both Tester One and Two, the more likely that they have found most of the bugs. The idea can be visualized with a Venn diagram:
+----------------------------------+ | | | +------------+ | | | | T | | | A | | | | | | | | +------|-------+ | | | | C | | | | +-----|------+ | | | | B | | | | | | | +--------------+ | | | +----------------------------------+
Since A and B are only random subsets of all possible combinations, they are not going to detect all possible unknown bugs (associated with data-input).
The key lies on C, the common area. If you look at the Venn diagram and if you imagine squeezing the superset T smaller and smaller, it will be less likely for C to be small--A and B must tend to overlap.
The whole point is to estimate the total number of bugs (again, associated with data-input, not everything else, such as workload) without having to go through an exhaustive testing.
Of course, that simple estimate probably won't be statistically very valid, since bugs are not independent. But it still gives a good conceptual insight--if a bunch of independent testers tend not to find common bugs, there're probably still pretty of bugs out there.
It is the same meta process with programming. A programmer is considered "good" not just because he writes code that works but he writes code that actually solves the problem pertaining to the users, not the problem the programmer finds interesting and feels like to solve. He doesn't stop just because "it works."
If the testing process only helps you answer the what (such as how many bugs) but not the why, the process is flawed.
Flawed in what sense? The what (such as benchmarking) only tells you where you are. The why helps you predict and plan. If someone doesn't learn something new (both what and why) out of a testing, the testing is pointless, no matter what fancy technique used and statistics derived. (Repeated testing without being able to pinpoint and solve anything is symptom. For one to blame testing that fails him is like he blaming his car driving him to the wrong destination.)