Keep It Simple, Stupid | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Seems almost certain to me, but my reasoning may well be faulty. Stats were never my forte. You're effectively running four different 32-bit hash functions on each string. According to Wikipedia for a 32-bit hash you only need about 110,000 hashes to get a 75% chance of a collision. Thus with 110,000 strings, getting a collision in all four stands at 75%**4, about 30% chance. That's with 110,000 strings. You have "several billion". With several billion strings you are going to have a very close to 100% chance of a collision on a 32-bit hash function. With four 32-bit hash functions, that's still close to 100% chance of a collision. Extending this to having 10 effective hash functions, this cannot increase the chance of collisions. Worst case scenario it has no effect. Here I think it does decrease the chance (difficult , but not enough to be useful.
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
In reply to Re: The statistics of hashing.
by tobyink
|
|