snippet
andye
Hi folks,
<p>
Following on from my question the other day ([Stats: Testing whether data is normally (Gaussian) distributed]), here's a simple normality test.
<p>
This is the [http://en.wikipedia.org/wiki/Jarque-Bera_test|Jarque-Bera test], "a goodness-of-fit measure of departure from normality, based on the sample kurtosis and skewness".
<p>
Kurtosis and skewness are calculated [http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm|like this]. <p>
Comments very welcome, if I've gone wrong here please do point it out!
<p>
Input <code>$source</code> should be a 1-D [cpan://PDL], type float (it'll probably work with other types, but be less useful!).
<p>
Output <code>$JB</code> is a number indicating nearness to the Normal distribution. High number = not Normal.
<p>
Warning: this can produce NaN for data where standard deviation =0, so if you plan to sort it afterwards, you'll need to weed out the NaN ones (as I [http://dev.perl.org/perl5/list-summaries/2003/p5p-200307-2.html|discovered for myself {'sort subroutine edge cases'}]).
<p>
I hope this is useful to someone.
<p>
Best wishes, [andye]
<CODE>
use PDL;
my ($mean,$std_dev,$median,$min,$max,$adev,$rms) = stats($source);
my $skewness = sclr(sum(
($source - $mean)**3
) / (
(nelem($source)-1) * $std_dev**3
));
my $exs_kurtosis = sclr(sum(
($source - $mean)**4
) / (
(nelem($source)-1) * $std_dev**4
) -3);
my $JB =
( nelem($source) / 6 )
*
(
$skewness**2
+ ($exs_kurtosis**2 / 4)
)
;
</CODE>