note
moritz
<p>I'll do a small simplification in order to use a much simpler model: I assume that we have one list (duplicates allowed) and one set (no duplicates allowed).
<p>Then for each member of the list, the probability of having a match in the set is P(1) = 1e6/2**32.
<p>Since we've assumed a list, all the probabilities of having matches are independent, and the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83.</p>
<p>If the number of matches is a Poisson distribution (and I suspect it is, in this example), then the standard deviation is simply the square root of the expectation value, so 15.5.</p>
<p>It is hard for me to estimate how big an error I've made by this simplification; I'll update the node if I get an idea of how to estimate it.</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-616540">
[http://perl6.org/|Perl 6 - the future is here, just unevenly distributed]
</div></div>
1015959
1015959