http://www.perlmonks.org?node_id=1043577

stamp1982 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

So I have three sets of data and I am trying to write a program that would calculate and print out for each of the data sets: number of measurements, average, variance and standard deviation (a subroutine should be used to do all). The program should also call the subroutines for each data set and print out the results.

So far this is what I have for the average. Please let me know where there are mistakes and how to proceed.
use strict; use warnings; my @Dset1 = (5, 6, 7 , 8, 10); my @Dset2= (10,11,12); my @Dset3 = (16,48,49); Data('Info',@Dset1); Data('Info',@Dset2); Data('Info',@Dset3); my $avg = average(\@data); print "The average is $avg \n"; sub average { @_ == 1 or die ('Sub usage: $average = average(\@array);'); my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; }

Replies are listed 'Best First'.
Re: Statistic in Perl(Using Subroutines)
by bliako (Monsignor) on Jun 30, 2018 at 10:11 UTC

    I will risk the ridicule of answering to a 15 5 year old thread for the sake of adding one more method of calculating basic descriptive statistics. And this is quite different from what anyone with basic engineering or CS background is used to, i.e. loop over your numbers to calculate their sum, divide by their count to get the average and having the average loop again over your numbers in order to calculate standard deviation. Apart from looping twice, one also has to keep the numbers in memory until the stats are calculated.

    Apparently there is another way to do it which, in my opinion, should have been the primary method tought from elementary schools to university. This of course is an online version which calculates the stats as new data comes in. There is no need to store numbers in memory. And stats are updated when a new data point arrives without the need to loop-twice over the WHOLE array of data. The maths are very simple and were put together by BP Welford in the 60's who also devised a way to avoid cumulative numerical accuracy errors.

    Two modules I have discovered so far in CPAN for doing this kind of statistics calculations:

    So, exactly the same functionality without the need to remember incoming data history in possibly gigabyte-long arrays and looping over it twice from the beginning once a new data point arrives.

Re: Statistic in Perl(Using Subroutines)
by toolic (Bishop) on Jul 11, 2013 at 01:04 UTC
Re: Statistic in Perl(Using Subroutines)
by jwkrahn (Abbot) on Jul 10, 2013 at 23:09 UTC
    sub average { @_ == 1 or die ('Sub usage: $average = average(\@array);'); my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; }
    sub average { ref( my $array_ref = $_[ 0 ] ) eq 'ARRAY' or die 'Sub usage: $average = average(\@array);'; my $sum; $sum += $_ for @$array_ref; return $sum / @$array_ref; }