Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: [OT]: Statistical significance?

by jjap (Monk)
on Dec 21, 2010 at 04:14 UTC ( #878151=note: print w/ replies, xml ) Need Help??


in reply to [OT]: Statistical significance?

If your looking to find if this distribution is likely a uniform distribution, this looks like a job for Chi square test.
Dealing with absolute values (updated from "percentages" to meet what I posted):

Sum of ((observed - expected)^2)/expected) dividing by df (number of degrees of freedom
Since you have 19 classes, df = 18 the Chi square statistic is: X-squared = 38297.18, df = 18, p-value < 2.2e-16
And that very small p-value indicates it does not depart significantly from a uniform distribution...
Updated originally got it backwards, as Anonymous Monk pointed out. Hence the following also became a moot point
However if the the same hump was to be observed over and over again, then my suggested approach was probably not the right one to follow, a real statistician might have a better insight. (That last part still holds ;-)


Comment on Re: [OT]: Statistical significance?
Select or Download Code
Re^2: [OT]: Statistical significance?
by Anonyrnous Monk (Hermit) on Dec 21, 2010 at 04:57 UTC
    that very small p-value indicates it does not depart significantly from a uniform distribution

    I think you got the interpretation wrong. Normally, one would reject the null hypothesis (=uniform distribution here) if the p-value is less than 0.05 or 0.01, corresponding to 5% or 1% significance level (i.e. the error probability of incorrectly rejecting the null hypothesis despite it being true). In other words, the deviations from a uniform distribution are statistically highly significant.

Re^2: [OT]: Statistical significance?
by BrowserUk (Pope) on Dec 22, 2010 at 00:53 UTC

    Many thanks to you (and AnonyRNous monk, whomever he should be), for your prompting me in the direction of Chi Squared, the Null hypothesis and all that jazz.

    After reading lots, much of which probably went over my head, I settled for an emperical study of comparing the results of 1e6 runs with a) a uniform (simple $data[ rand @data ]) pick; and b) the slightly non-uniform pick the datapoints I posted represented. The upshot was I could discern no difference in the final results of the algorithms.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://878151]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-11-29 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (203 votes), past polls