We don't bite newbies here... much PerlMonks

Re: [OT] Statistics question.

by moritz (Cardinal)
 on Jan 30, 2013 at 09:18 UTC ( #1016005=note: print w/ replies, xml ) Need Help??

in reply to [OT] Statistics question.

I'll do a small simplification in order to use a much simpler model: I assume that we have one list (duplicates allowed) and one set (no duplicates allowed).

Then for each member of the list, the probability of having a match in the set is P(1) = 1e6/2**32.

Since we've assumed a list, all the probabilities of having matches are independent, and the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83.

If the number of matches is a Poisson distribution (and I suspect it is, in this example), then the standard deviation is simply the square root of the expectation value, so 15.5.

It is hard for me to estimate how big an error I've made by this simplification; I'll update the node if I get an idea of how to estimate it.

Comment on Re: [OT] Statistics question.
Replies are listed 'Best First'.
Re^2: [OT] Statistics question.
by BrowserUk (Pope) on Jan 30, 2013 at 11:55 UTC
the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83 ... the standard deviation is simply the square root of the expectation value, so 15.5.

Based upon a run of 100 samples, that seems to match quite nicely:

```C:\test>bitvec2 -N=100
100
Mean: 230.95 stddev:14.75

I'm doing a run of 1000 samples, and if that shows no surprises, I'll be taking your figures as read and basing my testing upon it.

Thank you.

Update: The 1000 sample run marries well (good enough):

```C:\test>bitvec2 -N=1000
1000
Mean: 233.39 stddev:15.79

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://1016005]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2016-05-05 07:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What font do you use for programming?

Results (89 votes). Check out past polls.