Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

calculating the mode

by Anonymous Monk
on Jun 10, 2002 at 09:32 UTC ( [id://173059]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, i am trying to calculate the mode (most common) of an array of numbers. I also want to calculate the 2nd most common number etc until there are no number left. The code that i have written has no errors but also doesn't do anything!!! can anyone help?
#! /usr/local/bin/perl -w use strict; my $num_of_params; $num_of_params = @ARGV; if ($num_of_params < 2) { die ("\n You haven't entered enough parameters \n"); } open (BLASTX, $ARGV[0]) or die "unable to open file"; open (OUTFILE, ">$ARGV[1]"); my $line; my @array; my $number; my $count=0; my @frequency; my %count; my @ordered; while (<FILE>) { $line = $_; chomp ($line); @array = (); @array = split (/\s+/, $line); @frequency = $array[0]; sub odd_median { @frequency = shift; @array = sort @frequency; return $array[(@array - (0,0,1,0) [@array & 3]) /2]; } sub mode { @frequency = shift; my (%count, @result); foreach (@frequency) { $count{$_}++; } foreach (sort { $count{$b} <=> $count{$a} } keys %count) { last if @result && $count{$_} != $count{$result[0]}; push (@result, $_); } return odd_median \@result; } } close OUTFILE;
I am not sure whether i need to print values as well as return them? (not really sure exactly what return does) also how do i return to an OUTFILE with return. If anyone has any better solutions to this problem they will be much appreciated.

Replies are listed 'Best First'.
Re: calculating the mode
by frankus (Priest) on Jun 10, 2002 at 09:37 UTC
    Why are there so many lines here? AFAIKS all you need is a hash, a sort and an array.
    $counts{$_}++ for @array_of_numbers; my @sorted_array = sort { $counts{$a} <=> $counts{$b} } keys %counts;

    Basically, go through the array and tally the occurances of numbers within a hash.
    Finally sort the keys of the hash using a sort on the number or occurances.

    Or have I missed something?

    Amended: thanks to Molt ;)

    --

    Brother Frankus.

    ¤

      Thanks frankus, however i realise that i am ignorant and incompetent but i dont know how to go through the array and tally the occurances of numbers within a hash! :-) I also dont know how to sort the keys of the hash using a sort on the number of occurances. THANKS FOR BEING PATIENT!!! slow perl_learner!!
        I hope I did not make you feel ignorant or incompetent, was my answer too terse?
        I am working at the same time as posting here.

        Your solution seems to be the solution of one used to another language.
        I've made no judgement about your abilities that I can see.
        I am sorry if you feel they did. Go easy on the uppercase ;o)

        --

        Brother Frankus.

        ¤

Re: calculating the mode
by Bilbo (Pilgrim) on Jun 10, 2002 at 11:20 UTC
    This is how I would implement the approach suggested by Frankus and others in this thread (though I certainly wouldn't claim to be an experienced Perl programmer). Try this:
    #! /usr/local/bin/perl -w use strict; # Set up a hash, where $freq{word} = no of occurences of 'word' # (where word is actually a number in this case) my %freq; # Read from filename given on command line, or stdin if no file # name is given while (<>) { my @array = split (/\s+/, $_); # Update the hash foreach (@array) {$freq{$_}++} } # Sort the keys of the hash (the words or numbers in the file) into # an array in ascending order of $freq{key} (the number of occurences) my @sorted_array = sort { $freq{$a} <=> $freq{$b} } keys %freq; # The mode is the last value in the array my $mode = pop(@sorted_array); print "The mode is $mode\n";
      good. you can shorten your while loop a bit, though...

      while (<>) { $freq{$_}++ for split; }
      your @array variable was merely a temp, used to glue two statements together. split defaults to a whitespace split on $_, which is what <> is filling. i've flipped around the for loop, as well.

      also, i wouldn't be so destructive to the sorted array. instead of popping the value, how about selecting it, by

      my $mode = $sorted_array[-1]; ## aren't negative indexes neat?

      ~Particle *accelerates*

      Hi Bilbo nice work. Particle made a couple of points that I agree with. Using modifiers when appropriate IMO produces more intuitive and straight forward code. However in this case I would say that this is a minor improvement to a non-optimal solution. (People no lectures on premature optimization please, I've heard them all before and I'm not interested in debating if this is 'premature' or not.)

      Keeping track fo the frequencies, and then sorting them and using only one element is wasteful. A more efficient or scalable approach would be to simply add an if to the inner loop that keeps track of the mode key and mode count for the part of list read so far. Once completeing the list this value is the correct one.

      use strict; # Set up a hash, where $freq{word} = no of occurences of 'word' # (where word is actually a number in this case) my %freq; # Read from filename given on command line, or stdin if no file # name is given my ($mode_count,$mode_key)=(0,undef); while (<>) { chomp; # lose newlines from the lines # Split the line by whitespace and iterate over the results foreach (split (/\s+/, $_)) { if (++$freq{$_}>$mode_count) { # increment our frequency count +er # And keep track of the most common element $mode_count=$freq{$_}; $mode_key=$_; } } } print "The mode is $mode_key with $mode_count hits\n";
      Incidentally for the record I havent read the thread this is in. I only read your node because you linked to it in another node... If I'm repeating something then apologies.

      UPDATE: Sigh. I really should have read the thread first. Now I see why you were building a list and then sorting it. Apologies.

      Hmm, on rereading I suppose its possible that if there were very few types of item that my apporach would actually be slower than yours (I'd have to benchmark to be sure) But i think that in the average case the sort is overkill.

      Yves / DeMerphq
      ---
      Writing a good benchmark isnt as easy as it might look.

      thanks Bilbo - this is perfect!! you have ended a week of frustration. :-)
Re: calculating the mode
by zejames (Hermit) on Jun 10, 2002 at 09:45 UTC
    Hi !

    In your code, you read from FILE :

    while (<FILE>) { ... }

    but FILE is never opened... I think you should check that before anything else...

    Moreover, you never use any of the file descriptors you opened. I am doubtful ...

    HTH

    zejames
Re: calculating the mode
by perigeeV (Hermit) on Jun 10, 2002 at 10:57 UTC

    There's many things going south here. As zejames says you're not openning and using the correct files (you might want to check the success of the open, too). You have placed the subs within the scope of the while loop. You never actually call the mode function. You're passing odd_median() an array ref, but inside odd_median() you treat it like a regular array. You never actually print anything to OUTFILE, and if you did you would overwrite anything it previously held.

    As frankus mentions, the canonical way to count the occurance of elements is to use a hash with each array element as the key and the count as the value:

    for(@array) { $hash{$_}++; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://173059]
Approved by dwiz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-03-28 21:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found