Keep It Simple, Stupid PerlMonks

### Statistical data analysis

by moxliukas (Curate)
 on Jun 05, 2002 at 12:56 UTC Need Help??
 Category: Utility Scripts Author/Contact Info Petras Kudaras, aka moxliukas (moxliukas AT delfi DOT lt) Description: This short program outputs some statistical analysis data given the input data in two tab separated columns, first one being X column, and the second one Y column. It calculates means, quartiles, median, variance and standard deviation for both sets of data. It also outputs various sumations (X, X^2, Y, Y^2 and X*Y). It then calculates covariance, linear correlaton coeficient and determinance and finally comes up with linear regression equation. Most of this is simple and straightforward maths and I do hope it will prove useful to someone (well, I have used this script for my statistics lectures). ```use strict; use warnings; # Copyleft moxliukas 2002 # This program outputs some statistical information # about the data that is given in two tab separated # columns, first one being X column, and second one # Y column. unless (@ARGV==1) { print "Usage: \$0 data_file\n"; exit (-1); } my (@x, @y, \$f); open(\$f, "<\$ARGV[0]"); while(<\$f>) { /^(.*)\t(.*)\$/; push @x, \$1; push @y, \$2; } close \$f; my \$Xn = \$#x+1; my \$Yn = \$#y+1; my @X2 = map { \$_ * \$_ } @x; my @Y2 = map { \$_ * \$_ } @y; my (@XY, \$Sx, \$Sy, \$SX2, \$SY2, \$SXY, \$SX_Xv__Y_Yv); for (my \$i = 0; \$i <= \$#x; \$i++) { push @XY, \$x[\$i] * \$y[\$i]; } foreach my \$o (@x) { \$Sx += \$o; } foreach my \$o (@y) { \$Sy += \$o; } foreach my \$o (@X2) { \$SX2 += \$o; } foreach my \$o (@Y2) { \$SY2 += \$o; } foreach my \$o (@XY) { \$SXY += \$o; } my \$MX = \$Sx / \$Xn; my \$MY = \$Sy / \$Yn; my \$VX = (\$SX2 * \$Xn - \$Sx * \$Sx) / (\$Xn * \$Xn); my \$VY = (\$SY2 * \$Yn - \$Sy * \$Sy) / (\$Yn * \$Yn); my \$SdevX = sqrt(\$VX); my \$SdevY = sqrt(\$VY); my @Xsort = sort { \$a <=> \$b } @x; my @Ysort = sort { \$a <=> \$b } @y; my \$q1 = int(.25 * (\$Xn + 1)) - 1; my \$q2 = int(.5 * (\$Xn + 1)) - 1; my \$q3 = int(.75 * (\$Xn + 1)) - 1; for(my \$i = 0; \$i <= \$#x; \$i++) { \$SX_Xv__Y_Yv += (\$x[\$i] - \$MX) * (\$y[\$i] - \$MY); } my \$cov = \$SX_Xv__Y_Yv / \$Xn; my \$correl = \$cov / (\$SdevX * \$SdevY); my \$determ = \$correl * \$correl; my \$slope = (\$Xn * \$SXY - \$Sx * \$Sy) / (\$Xn * \$SX2 - \$Sx * \$Sx); my \$intercept = \$MY - \$slope * \$MX; #--- output ---# print <
Replies are listed 'Best First'.
Re: Statistical data analysis
by Anonymous Monk on Jun 05, 2002 at 17:31 UTC

Your method of calculating variance can result in loss of significant digits. For example, the following data produces a variance of 0 for X on my machine (causing div-by-zero errors later). The variance should be the same for both X and Y (as the data differs only by a constant).

```999999997	9999997
999999998	9999998
999999999	9999999
```

Better results may be achieved by "centering" your variance calculation about the mean.

```    my \$VX = 0;
\$VX += (\$_ - \$MX)**2 for @x;
\$VX /= \$Xn;

You are also making far more passes over the data than necessary.

Re: Statistical data analysis
by thraxil (Prior) on Jun 05, 2002 at 15:35 UTC

if this were a code review i'd point out the following:

• use strict; and -w
• your formatting is inconsistent. eg, the line:
`\$VX = (\$SX2*\$Xn - \$Sx * \$Sx)/(\$Xn*\$Xn);`
• i'd suggest using a HEREDOC to print the output instead of 27 or so print statements.

otherwise, the code is pretty straightforward, the variables are generally well-named, and the math looks right to me. so a ++ despite my nitpicks.

anders pearson

Thanks for the nitpicking. I have changed everything the way you suggested and this hopefully resulted in a more readable code. Thank you again.

Create A New User
Node Status?
node history
Node Type: sourcecode [id://171816]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2018-05-26 05:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
World peace can best be achieved by:

Results (192 votes). Check out past polls.

Notices?