in reply to Re: Re: Re: Software Design Resources in thread Software Design Resources
I think I'm the one who's not being clear, not you.
I suppose we're both talking about the "best judgement" introducing bias.
I fully understand the mechanism whereby it is possible to estimate how many bugs will be found on the basis of how many have been found, and projecting that forward, once the test cases are being produced randomly.
Perhaps the fuzziness of human language get in the way here. Any estimate is to estmate how many could be found, never ever how many will be found. To see that, I'll use the catchandrelease example.
Suppose the total number (the actual T) of unknown bugs is actually 100. Tester One was assigned with 20 (A) test cases; Tester Two 20 (B) also. 2 (C) bugs in common were found. The estimate is 200 total (possible) bugs (notice the large margin of error). Does it mean you will find 200 bugs given infinite time? Of course not, since we already know that there're 100 actual bugs. The estimate is 200, nevertheless. 200 is the possible total bugs you could find, based on actual available counts at the moment.
The technique and the skillset will affect the accurary of an estimate but the principle is still the same.
* * *
* * *
One side note, not to critique their method, just to provide complementing information, one should be careful when using a polynimial to fit data. Polynimial can fit any mathematical functions, given enough degrees (it's a theorem). Similarly it can fit any data, include white noise.
Consider you're testing the response time of your server in response to various levels of workload. You try a linear fit (a straight line) and polynomial of degree two (a+bx+cx^2). The polynomial fits the data better and you have the following.
X X
. . X . *
. . X . *
X . *
.X .
.X . .
X
.: data points
X: fitted to actual data
*: prediction, extrapolation
But it doesn't fit into the common sense (response time improves as the workload increases). This kind of error is very hard to detect in higher dimension, especially when you don't actually know what to expect.
The moral: A more complicated model does not always improve your prediction; it could even worsen it in some cases.
Re: Re: Re: Re: Re: Software Design Resources by BrowserUk (Pope) on Aug 22, 2003 at 10:04 UTC 
Fair enough:) I don't have the math to argue with you on this.
However, I would also not take it upon myself to argue with a certain IBM statastician whos work was the basis of at least some (I think, fairly major) elements of the statistics used in the process I am describing.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
 [reply] 

It's not about argument. Just mutual learning.
Funny enough, when that "statistician" (I think he's more known as a great mathematician) first came up with some very genuine estimation technique, the other engineers were very skeptical since they didn't know what he's doing. But the stuff worked. (He mentioned it on a TV documentary. Didn't say what actually that technique was.)
As your sig says "Examine what is said, not who speaks." I don't put faith in someone just because he has a PhD. In the business world, many PhDs gave dreadful advices. (A consultant (PhD, who could give you a fourhour lecture on anything) advised a web development house that they should lay off most of their programmers and sales reps partly because many of them "not working hard enough." The firm eventually failed not because people not working hard enough but partly because the business model (the consultant partly responsible for) wasn't working.)
 [reply] 

I wholeheartedly agree with not taking someones word for something just because of a piece of paper, or even a realworld reputation. However, in this case, the PhD in question has enough of a reputation, a body of work and a proven track record.
Add to that, my own abilities in this area were never sufficient to even begin to fully comprehend the ideas, never mind challenge them. To that end, it becomes imprudent if not impossible to "Examine what is said".
It is impossible to be fully conversant in every field, and there will always be those subject areas where you simply have to rely upon the words and skills of others. Once this point is reached it becomes a case of trying to pick the people whos words, ideas & skills you put your trust in as wisely as possible. Examining their words in the light of their peers reactions to them, and the faith they place in them, is as good a way as any I know, and better than most:)
It is an imperfect mechanism. Even the historically judged "best and brightest in their fields" tend to be superceded over time, though it tends to be in the detail rather than in any fundemental way.
That said. I never met Mandlebrot, though I did watch a live presentation he gave once (to do with fractals), and I am pretty sure that he didn't have any direct involvement in the project in question. It is quite possible that the people that performed the statistics in question misunderstood his theories, or misapplied them. I can atest to the accuracy of the predictions that the process produced albeit over a relatively short timeframe. Being fundementally an empiricist, it is this last point that is the strongest influence upon my faith in the methodology used.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
 [reply] 

Re: Re: Re: Re: Re: Software Design Resources by BrowserUk (Pope) on Aug 22, 2003 at 10:38 UTC 
Sorry for the second post, but I thought about this some more and I wanted to get your reaction to those thoughts. If I just modified the last post you may not have seen the update.
The estimate is 200 total (possible) bugs (notice the large margin of error). (and the rest of that paragraph)
I am under the strong, and I believe wellfounded, impression that in order for your probability calculation to make sense, the sample(s) used to estimate the total population are required to be random samples. This would not be the case if the testcases the programmers produce are done on the basis of experience (or best judgement).
If programmers A & B both write 20 identical test cases, which is unlikely, but not statistically impossible, then counting them as unique invalidates the statistics.
If the testcases they produce only cover 1% of the possible test cases and detect 2 bugs, there is no way to project the total number of bugs from that unless they represent a statistically valid sample from the total set of possible testcases. The only way for them to be a statistically valid sample is if they are a random selection from the total set of possibles. If they were written on the basis of best judgement they are not random.
Thats why the RTG was necessary for the approach I described.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
 [reply] 

...in order for your probability calculation to make sense, the sample(s) used to estimate the total population are required to be random samples.
That's correct.
... the testcases the programmers produce are done on the basis of experience (or best judgement)... the testcases they produce only cover 1% of the possible test cases and detect 2 bugs, there is no way to project the total number of bugs from that.
If I gave you the wrong impression that the "best judgement" sample constituted a random sample based on which the number of possible bugs were estimated, my bad. I think we both know how random sampling works.
I don't know what technique they actually used (if I did, I would be a psychic and it would be voodoo), so I can't explain how their stuff works. But I can tell how such estimation of possible bugs possible.
One simple possibility is to use regression (i.e. a modelbased estimation), as illustrated below.
 .
 . / . .
no.  . / . .
of  . ./ .
bugs  . / .
 / .
+
no. of testcases
If there's correlation (not necessarily linear) between number of testcases (independent variable) and number of bugs (dependent variable), we could use regression to estimate the total number of possible bugs assuming the total number of testcases are known and bound.
There's no limit what and how many independent variables you may use, nor what model.
Speaking of voodoo, in time series (since you indirectly mentioned Mandelbrot which made me think of fractal made me think of time series), you can do a bispectrum test to test if a series is linearily predictable or not without knowing what kind of process that generated the series. Pretty cool "voodoo." It's like saying I don't know where Homer came from but I'm sure he's blind.
And financial time series often almost follow a random walk process which sometimes result in a "long memory" process. That is, the underlining process is scaleindependent. In other words, if x(t) = a x(t1) + e, where e = random noise, you get (more or less) the same "a" regardless the unit of measurement, be it daily, weekly, etc. Hence, the process is selfsimilar (statistically). Hence, it's a "fractal"!
Since a random variable (such as number of bugs) or better yet a random/stochastic process could be a special case of fractal, that's where Mandelbrot (the "statistician") could come in.
* * *
* * *
Since I mentioned correlation, I might as well point out, what I didn't mention in the previous discussion of bugs estimation was "margin of error" (heard on TV often) or variation or variance (didn't want to confuse people with too many new concepts).
If two random variables (say, numbers of bugs found by two testersthe number of bugs itself could be treated as random variable, even if the testcases are not randomly selected) are correlated, a positive correlation will lead to higher variance, whereas negative lower. The intuition goes like this: negative correlation leads to cancelation; hence less variance (10 + 10 = 0), while positive correlation is like things tend to come all at once; hence higher variance (10 + 10 = 20).
Since bugs tend to have positive correlation (not due to sampling), a simple random sampling estimate based upon independence assumption underestimate the variance, "margin of error" or the severity of the bugs situation.
* * *
* * *
That leads us to talk about bugs (more precisely, number of bugs) as random variable/process. You can consider the "randomness" is a result of 1) random sampling or 2) the underlining process that generates those bugs.
Bugs as random variable due to random sample we have talked about. Bugs as random process is a new topic, which I suppose was what your people were doing back then back there.
I mentioned time series (a random process) and fractal and Mandelbrot. Since bugs could be a random process could be a time series could be a "fractal," it wouldn't be hard for Mandelbrot to figure out that the total possible bugs could be related to the upper bound of a time series. (I'm not saying that's what they did. I don't know what they did.)
Many process will generate a time series that is bound above (and/or below) in probabilistic or deterministic sense (random walk is a one that's not). If we can estimate the process that generates the values of a variable (such as bugs), we can tell the highest possible value of that variable.
One may feel, bugs generated by an underlining random process? It makes no sense. Well, the process is merely a model for prediction. It makes no difference if it objectively exists or not as long as the model gives us the right answer. (Think about how a lot of people found quantum mechanics absurdwhich is just a model that works.)
Treating bugs as random process means we assume there're correlation among bugs (temporal, spatial or whatever). Otherwise it's just white noise and a meaningless model. On the other hand, correlation complicates the estimation in random sampling. So, we can always explore the underlining structure of a variable and choose a right model and methodology accordingly to our advantage.
 [reply] 

