Are the numbers actually in the range 1 .. 100? What is the actual range? This could have some bearing on what constitutes a good solution.
Your hash technique could be written like this (please note that we're supplying a numeric comparison to sort. Otherwise you don't get numerical order.)
use strict;
use warnings;
my @numbers = (1..100);
my %count;
$count{$_}++ foreach @numbers;
print "$_ occurs $count{$_} time(s)\n" foreach sort { $a <=> $b } keys
+ %count;
The reason I asked about the actual range is because sometimes there are patterns to the data set that might allow you to employ a less general approach, with greater efficiency. For example, if you're dealing with positive integers in some reasonably narrow range, you can avoid the O(n log n) sort, and the expense of hash lookups if you just use an array to hold your counts. In the following demonstration I did away with the output (output would swamp a benchmark). Instead, I build an array of anonymous arrays holding an integer value and a count. It's a little contrived, but the goal was to provide a similar platform upon which the two methods could be similarly compared.
use strict;
use warnings;
use Benchmark qw(cmpthese);
@main::numbers = map {int(rand(10000))+1 } 1 .. 300000;
sub hash {
my %count;
$count{$_}++ foreach @main::numbers;
my @results = map { [ $_ => $count{$_} ] } sort { $a <=> $b } keys
+ %count;
}
sub array {
my @count;
$count[$_]++ foreach @main::numbers;
my @results = map { [ $_ => $count[$_] ] } grep { $count[$_] } 0 .
+. $#count;
}
cmpthese ( -5, {
hash => \&hash,
array => \&array,
} );
On my system...
Rate hash array
hash 19.4/s -- -55%
array 43.2/s 122% --
Going with the less general solution wouldn't make sense if your data doesn't consist only of positive integers in a fairly narrow range. But this is an example of where a more specialized algorithm may be a better approach if the data supports it. In fact, if you are dealing with positive integers, and they do span an even greater range, as long as it fits in memory the array may still be a better approach. It doesn't scale as well for sparse data sets, but for data sets where the counts from 0 to 'n' can fit in memory, the larger sets will see even more improvement over the hash/sort technique (since a sort grows log-linearly, and the array approach grows linearly).
|