Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: [OT] Statistics question.

by moritz (Cardinal)
on Jan 30, 2013 at 09:18 UTC ( #1016005=note: print w/ replies, xml ) Need Help??


in reply to [OT] Statistics question.

I'll do a small simplification in order to use a much simpler model: I assume that we have one list (duplicates allowed) and one set (no duplicates allowed).

Then for each member of the list, the probability of having a match in the set is P(1) = 1e6/2**32.

Since we've assumed a list, all the probabilities of having matches are independent, and the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83.

If the number of matches is a Poisson distribution (and I suspect it is, in this example), then the standard deviation is simply the square root of the expectation value, so 15.5.

It is hard for me to estimate how big an error I've made by this simplification; I'll update the node if I get an idea of how to estimate it.


Comment on Re: [OT] Statistics question.
Re^2: [OT] Statistics question.
by BrowserUk (Pope) on Jan 30, 2013 at 11:55 UTC
    the expectation value is simply 1e6 * P(1) = 1e6 * 1e6/2**32 = 232.83 ... the standard deviation is simply the square root of the expectation value, so 15.5.

    Based upon a run of 100 samples, that seems to match quite nicely:

    C:\test>bitvec2 -N=100 100 Mean: 230.95 stddev:14.75

    I'm doing a run of 1000 samples, and if that shows no surprises, I'll be taking your figures as read and basing my testing upon it.

    Thank you.

    Update: The 1000 sample run marries well (good enough):

    C:\test>bitvec2 -N=1000 1000 Mean: 233.39 stddev:15.79

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016005]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (15)
As of 2014-07-25 14:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (172 votes), past polls