<?xml version="1.0" encoding="windows-1252"?>
<node id="612779" title="Statistics: Jarque-Bera normality test" created="2007-04-30 10:16:54" updated="2007-04-30 06:16:54">
<type id="1980">
snippet</type>
<author id="41758">
andye</author>
<data>
<field name="doctext">
</field>
<field name="snippetdesc">
Hi folks,
&lt;p&gt;
Following on from my question the other day ([Stats: Testing whether data is normally (Gaussian) distributed]), here's a simple normality test.
&lt;p&gt;
This is the [http://en.wikipedia.org/wiki/Jarque-Bera_test|Jarque-Bera test], "a goodness-of-fit measure of departure from normality, based on the sample kurtosis and skewness". 
&lt;p&gt;
Kurtosis and skewness are calculated [http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm|like this]. &lt;p&gt;
Comments very welcome, if I've gone wrong here please do point it out!
&lt;p&gt;
Input &lt;code&gt;$source&lt;/code&gt; should be a 1-D [cpan://PDL], type float (it'll probably work with other types, but be less useful!). 
&lt;p&gt;
Output &lt;code&gt;$JB&lt;/code&gt; is a number indicating nearness to the Normal distribution. High number = not Normal. 
&lt;p&gt;
Warning: this can produce NaN for data where standard deviation =0, so if you plan to sort it afterwards, you'll need to weed out the NaN ones (as I [http://dev.perl.org/perl5/list-summaries/2003/p5p-200307-2.html|discovered for myself {'sort subroutine edge cases'}]).
&lt;p&gt;
I hope this is useful to someone. 
&lt;p&gt;
Best wishes, [andye]</field>
<field name="snippetcode">
&lt;CODE&gt;
		use PDL;

		my ($mean,$std_dev,$median,$min,$max,$adev,$rms) = stats($source);
		
		my $skewness = sclr(sum(   
		
						 ($source - $mean)**3
		
							) / (
	
						 (nelem($source)-1) * $std_dev**3 
						
					   ));
		
		my $exs_kurtosis = sclr(sum(   
		
						 ($source - $mean)**4
		
							) / (
	
						 (nelem($source)-1) * $std_dev**4 
						
					   ) -3);
		
		my $JB	= 
					( nelem($source) / 6 )
					*
					(
						$skewness**2
						+ ($exs_kurtosis**2 / 4)
					)
				  ;

&lt;/CODE&gt;</field>
</data>
</node>
