Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Re: Re: max value in a hash

by DaveH (Monk)
on May 31, 2004 at 16:46 UTC ( [id://357815]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: max value in a hash
in thread max value in a hash

You raise a good point that sorting is not the optimal (i.e. does the least work for the most gain) method for solving this problem. If I were given a sufficiently large hash (and I wanted to extract every last ounce of perfomance) I might code the highest value algorithm using something like this:

my %hashName = ( "term1" => 83, "term2" => 20, "term3" => 193 ); my $max; while ((undef, my $val) = each %hashName) { $max ||= $val; $max = $val if $val >= $max; }

The each ensures that you don't read too much data into memory at once (one key/value pair), and ensures that we only enumerate the data once. This would also work well with tied data (e.g. DB_File) as it would only scan one record at a time.

However, during my testing, unless you are searching BIG sets of data for the maximum, the sort method is actually much more efficient. As always, selecting the right algorithm relies on both knowing the correct algorithms (their big-O notation and so forth) AND knowing your data. For this problem, if you know your hashes are always small, use sort. You are unlikely to be able to code a pure Perl algorithm which actually executes faster. If your hashes are huge (say over 20000 keys), use the while loop and each. The Benchmarks below illustrate my point:

#!/usr/bin/perl use Benchmark qw(timethese cmpthese); print "SMALL HASH SIZES:-\n\n"; run_timings( 150_000, # Iterations 10, 100, 1000 # Hash Sizes ); print "LARGE HASH SIZES:-\n\n"; run_timings( 50, # Iterations 10_000, 50_000 # Hash Sizes ); print "HUGE HASH SIZES:-\n\n"; run_timings( 5, # Iterations 100_000, 1_000_000 # Hash Sizes ); exit; sub run_timings { my $iterations = shift; my @hash_sizes = @_; for my $n (@hash_sizes) { my %hashName = map { $_ => $_ } 1 .. $n; print "------------------------------------------\n"; print "Hash size: ", scalar(keys(%hashName)), "\n"; print "Max Value: ", (sort { $b <=> $a } values %hashName)[0], + "\n"; print "Timings:\n\n"; my $r = timethese($iterations, { 'new' => sub { my ($max) = sort { $b <=> $a } values %hashName; }, 'nosort' => sub { my $max, $val; while ((undef, $val) = each %hashName) { $max ||= $val; $max = $val if $val >= $max; } }, }); print "\nComparasion:\n\n"; cmpthese($r); print "\n\n"; } } __END__ Timing results follow:-
SMALL HASH SIZES:- ------------------------------------------ Hash size: 10 Max Value: 10 Timings: Benchmark: timing 150000 iterations of new, nosort... new: 1 wallclock secs ( 0.56 usr + 0.00 sys = 0.56 CPU) @ 26 +6429.84/s (n=150000) nosort: 2 wallclock secs ( 1.77 usr + 0.00 sys = 1.77 CPU) @ 84 +937.71/s (n=150000) Comparasion: Rate nosort new nosort 84938/s -- -68% new 266430/s 214% -- ------------------------------------------ Hash size: 100 Max Value: 100 Timings: Benchmark: timing 150000 iterations of new, nosort... new: 6 wallclock secs ( 5.67 usr + 0.00 sys = 5.67 CPU) @ 26 +450.36/s (n=150000) nosort: 15 wallclock secs (15.06 usr + 0.00 sys = 15.06 CPU) @ 99 +58.18/s (n=150000) Comparasion: Rate nosort new nosort 9958/s -- -62% new 26450/s 166% -- ------------------------------------------ Hash size: 1000 Max Value: 1000 Timings: Benchmark: timing 150000 iterations of new, nosort... new: 82 wallclock secs (80.11 usr + 0.00 sys = 80.11 CPU) @ 18 +72.45/s (n=150000) nosort: 157 wallclock secs (154.36 usr + 0.03 sys = 154.39 CPU) @ + 971.57/s (n=150000) Comparasion: Rate nosort new nosort 972/s -- -48% new 1872/s 93% -- LARGE HASH SIZES:- ------------------------------------------ Hash size: 10000 Max Value: 10000 Timings: Benchmark: timing 50 iterations of new, nosort... new: 1 wallclock secs ( 0.53 usr + 0.00 sys = 0.53 CPU) @ 94 +.16/s (n=50) nosort: 1 wallclock secs ( 0.95 usr + 0.00 sys = 0.95 CPU) @ 52 +.47/s (n=50) Comparasion: Rate nosort new nosort 52.5/s -- -44% new 94.2/s 79% -- ------------------------------------------ Hash size: 50000 Max Value: 50000 Timings: Benchmark: timing 50 iterations of new, nosort... new: 7 wallclock secs ( 6.41 usr + 0.00 sys = 6.41 CPU) @ 7 +.81/s (n=50) nosort: 5 wallclock secs ( 5.00 usr + 0.02 sys = 5.02 CPU) @ 9 +.97/s (n=50) Comparasion: Rate new nosort new 7.81/s -- -22% nosort 9.97/s 28% -- HUGE HASH SIZES:- ------------------------------------------ Hash size: 100000 Max Value: 100000 Timings: Benchmark: timing 5 iterations of new, nosort... new: 1 wallclock secs ( 1.61 usr + 0.00 sys = 1.61 CPU) @ 3 +.11/s (n=5) nosort: 2 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @ 4 +.70/s (n=5) Comparasion: Rate new nosort new 3.11/s -- -34% nosort 4.70/s 51% -- ------------------------------------------ Hash size: 1000000 Max Value: 1000000 Timings: Benchmark: timing 5 iterations of new, nosort... new: 26 wallclock secs (25.59 usr + 0.06 sys = 25.66 CPU) @ 0 +.19/s (n=5) nosort: 11 wallclock secs (11.22 usr + 0.02 sys = 11.23 CPU) @ 0 +.45/s (n=5) Comparasion: s/iter new nosort new 5.13 -- -56% nosort 2.25 128% --

I hope that this helps. In general though, if you are looking to improve performance by optimising simple algorithms like this, then you are looking in the wrong place! ;-) For the bulk of common problems, you will find much bigger performance gains by optimising any file and/or network I/O than you will ever gain by chopping 20% off a loop which gets executed once or twice in your script.

Cheers,

-- Dave :-)


$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print

Replies are listed 'Best First'.
Re^4: max value in a hash
by sesemin (Beadle) on Sep 12, 2008 at 16:56 UTC
    Now another problem If you have a hash with multiple values for each key such as the following. How would you calculate the max for each key: Example
    Keys value A 1 A 2 A 3 B 4 B 5 B 6
    use strict; use warnings; use List::Util qw(first max); while(<INPUT2>){ (my $probeset_id, my $origin, my $probeseq, my $pip, my $gc, +my $affyscore) = split("\t", $_); push (@{$hash_F2_1{$origin}}, $pip); # This makes a hash of m +ultiple value for each probeset ID foreach my $origin (sort keys %hash_F2_1){ foreach my $position (@{$hash_F2_1{$origin}}){ my $max = max values %hash_F2_1; print "$origin\t$max\n"; } }
    THIS CODE DOES NOT WORK!! Any Idea
      Assuming you have a Hash-of-Arrays:
      use strict; use warnings; use List::Util qw(max); my %hoa = ( a => [(1..3)], b => [(4..6)] ); for (sort keys %hoa) { print "$_ ", max(@{$hoa{$_}}), "\n"; } __END__ a 3 b 6
        Thanks rsied1 It worked!!!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://357815]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 11:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found