http://www.perlmonks.org?node_id=357749

rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm looking for an easier/quicker way of finding the max "value" in a hash, rather than doing a value sort and pulling out the first variable in the list.

#!/usr/bin/perl # My old code use strict; use warnings; my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); my $i = 0; my $max = 0; foreach (sort sortHash (keys (%hashName))) { if ($i == 0) { $max = $hashName{$_}; last } } # end-foreach sub sortHash { $hashName{$b} <=> $hashName{$a}; }
Anyone know a better way?

Cheers,
Reagen

Replies are listed 'Best First'.
Re: max value in a hash
by liz (Monsignor) on May 31, 2004 at 09:22 UTC
    use List::Util qw(max); my $max = max values %hashName;

    Please note that values() creates aliases to the values in the hash, so this method would not involve copying any values of the hash (and would therefore scale pretty well).

    Liz

      thats nice. I like...
Re: max value in a hash
by Not_a_Number (Prior) on May 31, 2004 at 09:14 UTC
    my $max = 0; $_ > $max and $max = $_ for values %hashName; print $max;

    dave

      This approach will not give the correct result if all of the values are negative:
      my %hashName = (a => -1, b => -2, c => -3); my $max = 0; $_ > $max and $max = $_ for values %hashName; print $max; __END__ 0 # expected -1
      Of course, if you are sure that all values are positive numeric values, there is no problem. But it is something to keep in mind in general, I think.

      Liz

        And to add in general use of situations like this (for min and max) use 'each' first. Modified version from above.
        my %hashName = (a => -1, b => -2, c => -3); my $max = $hash{each %hashName}; $_ > $max and $max = $_ for values %hashName; print $max;
        And you'll the scalar set as one in the hash in the beginning.
      thanks dave
Re: max value in a hash
by DaveH (Monk) on May 31, 2004 at 09:52 UTC

    Hi rsiedl,

    You say you want to find an "easier/quicker" way to find the highest value in a hash. Well, do you mean quicker to develop (less lines), or quicker to run (faster execution)? I'm assuming that "easier" means easier to read or understand?

    Anyway, the quickest (both meanings) and easiest way is to just use the exact method you have already discared. Why? The sort built-in will always be the quickest sorting method for general numeric or string data, since it is a Perl built-in written directly in C, and is probably one of the most optimised Perl built-in functions (in fact, I wouldn't be surprised if my Benchmark timings below are inaccurate simply because of some optimisation or other I am unaware of). Also, the values function is the quickest way of getting at the values of a hash if you don't care about the keys. Combining sort and values is the quickest, and in my opinion, easiest way of achieving your objective.

    my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); my ($max) = sort { $b <=> $a } values %hashName;

    This Benchmark script hopefully proves my point sufficiently well:

    #!/usr/bin/perl use Benchmark qw(timethese cmpthese); my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); print "Timings:\n\n"; my $r = timethese(1000000, { 'orig' => sub { my $i = 0; my $max = 0; foreach (sort sortHash (keys (%hashName))) { if ($i == 0) { $max = $hashName{$_}; last } } # end-foreach sub sortHash { $hashName{$b} <=> $hashName{$a}; } }, 'new' => sub { my ($max) = sort { $b <=> $a } values %hashName; }, }); print "\n\nComparison:\n\n"; cmpthese($r); __END__ Timings: Benchmark: timing 1000000 iterations of new, orig... new: 2 wallclock secs ( 2.44 usr + 0.00 sys = 2.44 CPU) @ 41 +0340.58/s (n=1000000) orig: 8 wallclock secs ( 8.42 usr + 0.00 sys = 8.42 CPU) @ 11 +8736.64/s (n=1000000) Comparison: Rate orig new orig 118737/s -- -71% new 410341/s 246% --

    I hope that this helps. :-)

    Update 31/05/2004: The Benchmark code I showed above seemed to be flawed. For some reason, using sub refs rather than eval strings seems to give better timings... the code was changed to reflect this.

    Cheers,

    -- Dave :-)


    $q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print

      The sort built-in will always be the quickest sorting method ...

      ... if a sort is what you need. But the best sorting algorithms tend to be O(n log n), and finding the highest value in an unsorted list should be no worse than O(n). The constant factors will probably leave sort() winning for shorter lists, but it is guaranteed that it will lose for sufficiently long lists.

      Hugo

        You raise a good point that sorting is not the optimal (i.e. does the least work for the most gain) method for solving this problem. If I were given a sufficiently large hash (and I wanted to extract every last ounce of perfomance) I might code the highest value algorithm using something like this:

        my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); my $max; while ((undef, my $val) = each %hashName) { $max ||= $val; $max = $val if $val >= $max; }

        The each ensures that you don't read too much data into memory at once (one key/value pair), and ensures that we only enumerate the data once. This would also work well with tied data (e.g. DB_File) as it would only scan one record at a time.

        However, during my testing, unless you are searching BIG sets of data for the maximum, the sort method is actually much more efficient. As always, selecting the right algorithm relies on both knowing the correct algorithms (their big-O notation and so forth) AND knowing your data. For this problem, if you know your hashes are always small, use sort. You are unlikely to be able to code a pure Perl algorithm which actually executes faster. If your hashes are huge (say over 20000 keys), use the while loop and each. The Benchmarks below illustrate my point:

        #!/usr/bin/perl use Benchmark qw(timethese cmpthese); print "SMALL HASH SIZES:-\n\n"; run_timings( 150_000, # Iterations 10, 100, 1000 # Hash Sizes ); print "LARGE HASH SIZES:-\n\n"; run_timings( 50, # Iterations 10_000, 50_000 # Hash Sizes ); print "HUGE HASH SIZES:-\n\n"; run_timings( 5, # Iterations 100_000, 1_000_000 # Hash Sizes ); exit; sub run_timings { my $iterations = shift; my @hash_sizes = @_; for my $n (@hash_sizes) { my %hashName = map { $_ => $_ } 1 .. $n; print "------------------------------------------\n"; print "Hash size: ", scalar(keys(%hashName)), "\n"; print "Max Value: ", (sort { $b <=> $a } values %hashName)[0], + "\n"; print "Timings:\n\n"; my $r = timethese($iterations, { 'new' => sub { my ($max) = sort { $b <=> $a } values %hashName; }, 'nosort' => sub { my $max, $val; while ((undef, $val) = each %hashName) { $max ||= $val; $max = $val if $val >= $max; } }, }); print "\nComparasion:\n\n"; cmpthese($r); print "\n\n"; } } __END__ Timing results follow:-

        I hope that this helps. In general though, if you are looking to improve performance by optimising simple algorithms like this, then you are looking in the wrong place! ;-) For the bulk of common problems, you will find much bigger performance gains by optimising any file and/or network I/O than you will ever gain by chopping 20% off a loop which gets executed once or twice in your script.

        Cheers,

        -- Dave :-)


        $q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
      Thanks Dave,

      That is interesting. And i've also learnt how to test the speed of my scripts :)
      Ideally i am after speed and secondly after easy of reading/understanding.
      Thanks for your input. I've learnt a bit already today.
      Cheers,
      Reagen
        Your priorities are exactly backwards. Worry about ease of reading/understanding first and then performance second (if ever).

        The reason is that most of your time and energy in programming is going to be spent in maintainance. Most of the time spent by the program will be in a small section of code. Making the code easy to read/understand will assist with getting it written and written correctly up front. It assists with maintainance. This is a good thing.

        If the code winds up fast enough, then you never have to think about performance. If it winds up too slow, then you can profile it, figure out what small section is slow (frequently not what you thought - consider how the naive sort method performed in the above benchmarks), and then optimize only that to fix that problem. If it still is too slow, then it is time for a more aggressive strategy - like converting a chunk of your program to C.

        In no case is the proper solution to performance to micro-optimize every step of the way.

        For further reference read what Code Complete says about code tuning. Sure, the examples are in C and statistics are quoted from FORTRAN days. But that is knowledge that ages well and applies for Perl as well as any other language.

Re: max value in a hash
by neniro (Priest) on May 31, 2004 at 09:25 UTC
    *laughs* - damn I'm too slow.
    I suggest List::Util for this task.
    #!/usr/bin/perl use strict; use warnings; use List::Util qw(first max); my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); my $max = max values %hashName; print $max, "\n";
Re: max value in a hash
by mscharrer (Hermit) on Sep 15, 2008 at 08:28 UTC
    Hi, while you already got a solution for your problem I would like to comment on your code. You seem to use a foreach loop only to abort it in the first iteration which you don't really have to in Perl. It looks very C'ish to me.

    First, instead of:

    my $i = 0; foreach (sort sortHash (keys (%hashName))) { if ($i == 0) { $max = $hashName{$_}; last } } # end-foreach
    you could write:
    foreach (sort sortHash (keys (%hashName))) { $max = $hashName{$_}; last; } # end-foreach
    because there is no need for the $i when you just check for 0.

    Second, even easier in Perl, you can index returned lists when you put them in ( ):

    my $max = ( sort sortHash (keys (%hashName)) )[0]; # also possible: my $min = ( sort sortHash (keys (%hashName)) )[-1];
    or simply by doing an array assignment:
    my ($max) = sort sortHash (keys (%hashName));
    also the ( ) around and after keys aren't necessary and avoiding them would make the code more readable.