Category: | Utility Scripts |
Author/Contact Info | Petras Kudaras, aka moxliukas (moxliukas AT delfi DOT lt) |
Description: | This short program outputs some statistical analysis data given the input data in two tab separated columns, first one being X column, and the second one Y column. It calculates means, quartiles, median, variance and standard deviation for both sets of data. It also outputs various sumations (X, X^2, Y, Y^2 and X*Y). It then calculates covariance, linear correlaton coeficient and determinance and finally comes up with linear regression equation.
Most of this is simple and straightforward maths and I do hope it will prove useful to someone (well, I have used this script for my statistics lectures). |
use strict; use warnings; # Copyleft moxliukas 2002 # This program outputs some statistical information # about the data that is given in two tab separated # columns, first one being X column, and second one # Y column. unless (@ARGV==1) { print "Usage: $0 data_file\n"; exit (-1); } my (@x, @y, $f); open($f, "<$ARGV[0]"); while(<$f>) { /^(.*)\t(.*)$/; push @x, $1; push @y, $2; } close $f; my $Xn = $#x+1; my $Yn = $#y+1; my @X2 = map { $_ * $_ } @x; my @Y2 = map { $_ * $_ } @y; my (@XY, $Sx, $Sy, $SX2, $SY2, $SXY, $SX_Xv__Y_Yv); for (my $i = 0; $i <= $#x; $i++) { push @XY, $x[$i] * $y[$i]; } foreach my $o (@x) { $Sx += $o; } foreach my $o (@y) { $Sy += $o; } foreach my $o (@X2) { $SX2 += $o; } foreach my $o (@Y2) { $SY2 += $o; } foreach my $o (@XY) { $SXY += $o; } my $MX = $Sx / $Xn; my $MY = $Sy / $Yn; my $VX = ($SX2 * $Xn - $Sx * $Sx) / ($Xn * $Xn); my $VY = ($SY2 * $Yn - $Sy * $Sy) / ($Yn * $Yn); my $SdevX = sqrt($VX); my $SdevY = sqrt($VY); my @Xsort = sort { $a <=> $b } @x; my @Ysort = sort { $a <=> $b } @y; my $q1 = int(.25 * ($Xn + 1)) - 1; my $q2 = int(.5 * ($Xn + 1)) - 1; my $q3 = int(.75 * ($Xn + 1)) - 1; for(my $i = 0; $i <= $#x; $i++) { $SX_Xv__Y_Yv += ($x[$i] - $MX) * ($y[$i] - $MY); } my $cov = $SX_Xv__Y_Yv / $Xn; my $correl = $cov / ($SdevX * $SdevY); my $determ = $correl * $correl; my $slope = ($Xn * $SXY - $Sx * $Sy) / ($Xn * $SX2 - $Sx * $Sx); my $intercept = $MY - $slope * $MX; #--- output ---# print <<EOI; Sums: -------- X: $Sx Y: $Sy X^2: $SX2 Y^2: $SY2 XY: $SXY n: $Xn -------- Mean X: $MX Mean Y: $MY Variance X: $VX Variance Y: $VY St. dev. X: $SdevX St. dev. Y: $SdevY -------- Quartiles -------- Q1 of X: $Xsort[$q1] Q2 of X: $Xsort[$q2] Q3 of X: $Xsort[$q3] Q1 of Y: $Ysort[$q1] Q2 of Y: $Ysort[$q2] Q3 of Y: $Ysort[$q3] -------- Covariance: $cov Linear correlation: $correl Determinance: $determ -------- Linear regression -------- Slope: $slope Intercept: $intercept Y = $intercept + X * ($slope) EOI |
Back to
Code Catacombs