|laziness, impatience, and hubris|
Re^4: [OT] The statistics of hashing.by roboticus (Canon)
|on Apr 03, 2012 at 13:32 UTC||Need Help??|
In the iterative solution, we're accumulating f(x)=(1-e^(-x/N))^h over the range of x=0 .. NumSamples. That's a rough form of computing the definite integral of the expression. Integrating over three variables (x, N, h) would be a pain, so I treated N and h as constants.
So first, we multiply out our f(x) expression to remove the exponent. So using h as 2, we get:
Computing a definite integral over a range is simply integ(f(x)) evaluated at the upper limit less the value evaluated at the lower limit. This causes the C terms to cancel.
Pascal's triangle comes out because we've got (a+b)^n, and when we multiply it out, we get the binomial expansion which is where the coefficients come into play.
One point I should mention: You don't have to use 0 as the lower bound. If you wanted the number of collisions you'd experience from sample A to sample B, just evaluate integ(f(B))-integ(f(A)). By using A=0 we compute the number of collisions for the entire run.
When your only tool is a hammer, all problems look like your thumb.