Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Calculating statistical median

by Lhamo_rin (Friar)
on Jul 13, 2005 at 14:39 UTC ( #474564=perlquestion: print w/replies, xml ) Need Help??

Lhamo_rin has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, Does anybody have a perl script that outputs the statistical median value of an array of numbers? For example:
@array = qw/1 2 3 4/;
I would need the output to give me something like 2.5.

Replies are listed 'Best First'.
Re: Calculating statistical median
by kwaping (Priest) on Jul 13, 2005 at 14:43 UTC
      This doesnt seem to support lists of an even number
      Update:
      It does & works well



      This is not a Signature...
Re: Calculating statistical median
by tlm (Prior) on Jul 14, 2005 at 01:39 UTC

    For something this simple, I prefer to code it. With List::Util::sum and POSIX::ceil:

    sub median { sum( ( sort { $a <=> $b } @_ )[ int( $#_/2 ), ceil( $#_/2 ) ] )/2; } print median( 1..4 ), "\n"; print median( 1..5 ), "\n"; __END__ 2.5 3

    the lowliest monk

      What's the advantage of coding it yourself in this specific situation? Other than for fun, which I completely understand.

      I'm asking because I normally code things myself instead of using available modules for one of two reasons: to learn how to do it (aka for fun), or to avoid the overhead of loading a module. In your example, you're loading two modules instead of the single one I suggested. Plus, you obviously already know how to do this operation, so it can't be a learning exercise. So I'm baffled.

      I am not implying anything, I'm just truly curious as to why. :)

        The principal reason in this case is that I try to minimize the number of non-core modules that my code uses, since each non-core module introduces one more potential obstacle, however small it may be, for someone else using my code. I.e., I'd never use a non-core module just to import a simple one-line function. More generally, I weigh the tradeoff between coding something myself and introducing a non-core dependency. The bigger and more complex the candidate module (i.e. the more potentially difficult its installation by some other user), the more I am willing to code myself.

        But I don't want to overstate this point! When I speak in this context about "coding something myself" I am usually referring to a small bit of functionality that I may be able to implement in only a few lines of code. Of course, this also assumes that I feel confident in coding something myself; there are many problem areas that I don't understand well enough and am glad to use modules written by others, even if my needs could have been met with only a few lines by an expert.

        One other reason why I may code something myself is if I find the documentation for a module so poor that I suspect that the apparent time saving of using the module instead of coding it myself will be lost trying to make up for the bad documentation (e.g. by wading through the source).

        A third reason is that sometimes the available CPAN alternative is not good for some reason (usually speed).

        But again, just to be clear, even with all these considerations, I end up using plenty of modules, both core and non-core.

        the lowliest monk

      This is not a statistical median: a median of array of integers must be integer
Re: Calculating statistical median
by trammell (Priest) on Jul 13, 2005 at 16:46 UTC
Re: Calculating statistical median
by Anonymous Monk on Jul 13, 2005 at 16:11 UTC
    #!/usr/bin/perl -w use strict; my @array = qw/1 2 3 4 9/; print median(@array); sub median { my @vals = sort {$a <=> $b} @_; my $len = @vals; if($len%2) #odd? { return $vals[int($len/2)]; } else #even { return ($vals[int($len/2)-1] + $vals[int($len/2)])/2; } }
      This is the only sample here that returns correct median. Thanx guy.
Re: Calculating statistical median
by monkey_boy (Priest) on Jul 13, 2005 at 15:39 UTC
    Do you mean "median" or "mean"?
    They are different things.


    This is not a Signature...
      Median.
Re: Calculating statistical median
by dirac (Beadle) on Jul 13, 2005 at 15:18 UTC
    sub mean { my ($arrayref) = @_; my $result; foreach (@$arrayref) { $result += $_ } return $result / @$arrayref; } my @points = qw(1 2 3 4); print mean \@points;

      The original post pretty clearly states that he wants the median. Why you posted a snippet that calculates the mean is beyond me. update: Maybe the original post was updated at some point without notice?

      Hi.

      Is there a special reason for the array reference? Versus this?
      sub mean { my $result; foreach (@_) { $result += $_ } return $result / @_; } my @points = qw(1 2 3 4); print mean @points;
        Copying an array or hash into @_ takes time. Passing arrays and hashes by reference is straightforward.
        See Benchmark
        If you want median:
        sub median { $_[0]->[ @{$_[0]} / 2 ] } my @points = 0..100; print median(\@points), "\n";
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://474564]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2021-12-06 03:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (31 votes). Check out past polls.

    Notices?