Come for the quick hacks, stay for the epiphanies. | |
PerlMonks |
Re^3: The statistics of hashing.by tobyink (Canon) |
on Apr 01, 2012 at 09:13 UTC ( [id://962866]=note: print w/replies, xml ) | Need Help?? |
Indeed, but the errors on Wikipedia are not evenly distributed. On a subject such as the birthday attack I'd expect Wikipedia's article to be on par with any other authority (short-lived events of vandalism notwithstanding).
The exact calculation involves some big numbers. But assuming that c($x, $y) is the "pick $y from $x" function used in combinatorics, then the probability of a collision for $n strings and an evenly distributed 32-bit hash function should be:
Big numbers. Horrible to calculate. Can be approximated though...
Calculating $p is still horrible, but calculating $t is easier. If $t is above 20 then $p is 1.00000 when rounded to 6 significant figures. Thus you can effectively be sure to have a collision with a 32-bit hash function once $t is above 20. You can figure out an $n which triggers $t to be 20 using:
It's about 414,000. So with 414,000 strings, you are effectively certain to get collision on a 32-bit hash function. Where I think my reasoning and tye's differ (and tye is almost certainly correct here - blame it on me answering late at night) is that I was then looking at the probabilities that you will have had collisions in all four (or ten) hash functions at the end of the entire run. With even half a million strings, that is a given. What you're actually doing is looking at events where a single string triggers a simultaneous collision in all the hash functions. I defer to tye's calculations for that.
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
In Section
Seekers of Perl Wisdom
|
|