G'day Eily,
++
I tried a few things independently; however, it appears that much of that is very similar to what you've done,
so I'll post it here for comparison.
"my @keys = keys %$hash_ref;" was one of my first thoughts and this appeared to be a definite winner.
I also tried inlining the results
(now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"):
that proved to be slower than using "@keys".
I had code very similar to your "sum map ...";
although, I used sum0.
That appeared to be slower (even with the "map EXPR" form I used);
I suspect any gains from sum were overshadowed by losses from map;
I didn't investigate that any further.
I didn't think of caching.
That's a good idea, and might put "sum(0) map" back in the picture;
however, as you stated, that will depend on the OP's data (which hasn't been shown).
I dummied up some test data (based on the OP's description but, I'm sure, far from representative);
ran some basic timings; and included some sanity checking.
Here's the code I found to be fastest.
#!/usr/bin/env perl -l
use strict;
use warnings;
use Time::HiRes 'time';
my $hash_ref;
@$hash_ref{'a' .. 'j'} = 1 .. 10;
my $array_ref = [ ('v-w-x-y-z') x 2e6 ];
my $foo;
my $value = 0;
for my $outer ('v' .. 'z') {
$foo->{$outer}{$_}{value} = ++$value for 'a' .. 'j';
}
my $t0 = time;
op_code();
my $t1 = time;
printf "op_code: %.6f\n", $t1 - $t0;
kens_code();
my $t2 = time;
printf "kens_code: %.6f\n", $t2 - $t1;
print '*** Compare ***';
printf "kens/op: %.6f%%\n", (($t2 - $t1) / ($t1 - $t0)) * 100;
sub op_code {
my $bar;
foreach my $a (@{ $array_ref }) {
my $i = 0;
foreach my $b (split('-', $a)) {
foreach my $c (keys %{ $hash_ref }) {
$i += $foo->{$b}->{$c}->{'value'};
}
}
push @{ $bar->{$i} }, $a;
}
print '*** op_code ***';
print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar;
}
sub kens_code {
my $bar;
my @keys = keys %$hash_ref;
for my $outer (@$array_ref) {
my $sum;
for my $inner (split /-/, $outer) {
for (@keys) {
$sum += $foo->{$inner}{$_}{value};
}
}
push @{$bar->{$sum}}, $outer;
}
print '*** kens_code ***';
print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar;
}
The array, with two million elements, took about 30s (so it's roughly comparable to what the OP describes).
My code was typically shaving around 25-30% off of this.
I ran it quite a few times — here's a fairly representative run.
*** op_code ***
1275 1999999 v-w-x-y-z
op_code: 31.985272
*** kens_code ***
1275 1999999 v-w-x-y-z
kens_code: 23.075756
*** Compare ***
kens/op: 72.144941%
By the way, I totally agree with your comments re $a and $b: I only use those as special variables.
I'm not completely averse to single-letter variable names, such as $i for a loop index;
although, I do cringe when I find them liberally scattered through production code —
meaningful names are a much better choice.
|