Pathologically Eclectic Rubbish Lister  
PerlMonks 
Re: [OT] Statistics question.by moritz (Cardinal) 
on Jan 30, 2013 at 09:18 UTC ( #1016005=note: print w/ replies, xml )  Need Help?? 
I'll do a small simplification in order to use a much simpler model: I assume that we have one list (duplicates allowed) and one set (no duplicates allowed). Then for each member of the list, the probability of having a match in the set is P(1) = 1e6/2**32. Since we've assumed a list, all the probabilities of having matches are independent, and the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83. If the number of matches is a Poisson distribution (and I suspect it is, in this example), then the standard deviation is simply the square root of the expectation value, so 15.5. It is hard for me to estimate how big an error I've made by this simplification; I'll update the node if I get an idea of how to estimate it.
In Section
Seekers of Perl Wisdom

