Optimize a perl block

IruP has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Optimize a perl block by Eily (Monsignor) on Sep 06, 2017 at 10:25 UTC
One thing I see is that $hash_ref seems to be constant, so you can compute the list of keys once instead of each time: `@keys = keys %$hash_ref;` and `for my $key (@keys)` And if several $a values can share $b sub values, you can keep the subsums in cache to avoid recomputing them (untested because you didn't provide sample data): `use List::Util qw( sum ); my %cache; my @keys = keys %$hash_ref; foreach my $a_ (@{ $array_ref }) { my $i = 0; foreach my $b_ (split('-', $a_)) { $i += $cache{$b_} //= sum map { $_->{value} } @{ $foo->{$b_} } +{@keys}; # Use the cached value or compute the sum } push @{ $bar->{$i} }, $a_; }` [download] Using the orcish manoeuver, List::Util, and a slice I suppose the name are only because this is an example? Even in a example I would avoid $a and $b, (even more so when using List::Util) because of their special meaning. (Edit) And I avoid single char names as a general rule Edit $hash_ref->{$b_ } should have been $foo->{$b_}. Thanks kcott	[reply] [d/l] [select]
Re^2: Optimize a perl block by kcott (Archbishop) on Sep 07, 2017 at 01:53 UTC
G'day Eily, ++ I tried a few things independently; however, it appears that much of that is very similar to what you've done, so I'll post it here for comparison. "`my @keys = keys %$hash_ref;`" was one of my first thoughts and this appeared to be a definite winner. I also tried inlining the results (now discarded, but it was something like: "`sub hrkeys () { keys %$hash_ref }`"): that proved to be slower than using "`@keys`". I had code very similar to your "`sum map ...`"; although, I used sum0. That appeared to be slower (even with the "`map EXPR`" form I used); I suspect any gains from `sum` were overshadowed by losses from `map`; I didn't investigate that any further. I didn't think of caching. That's a good idea, and might put "`sum(0) map`" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown). I dummied up some test data (based on the OP's description but, I'm sure, far from representative); ran some basic timings; and included some sanity checking. Here's the code I found to be fastest. #!/usr/bin/env perl -l use strict; use warnings; use Time::HiRes 'time'; my $hash_ref; @$hash_ref{'a' .. 'j'} = 1 .. 10; my $array_ref = [ ('v-w-x-y-z') x 2e6 ]; my $foo; my $value = 0; for my $outer ('v' .. 'z') { $foo->{$outer}{$_}{value} = ++$value for 'a' .. 'j'; } my $t0 = time; op_code(); my $t1 = time; printf "op_code: %.6f\n", $t1 - $t0; kens_code(); my $t2 = time; printf "kens_code: %.6f\n", $t2 - $t1; print '* Compare '; printf "kens/op: %.6f%%\n", (($t2 - $t1) / ($t1 - $t0)) 100; sub op_code { my $bar; foreach my $a (@{ $array_ref }) { my $i = 0; foreach my $b (split('-', $a)) { foreach my $c (keys %{ $hash_ref }) { $i += $foo->{$b}->{$c}->{'value'}; } } push @{ $bar->{$i} }, $a; } print '* op_code '; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; } sub kens_code { my $bar; my @keys = keys %$hash_ref; for my $outer (@$array_ref) { my $sum; for my $inner (split /-/, $outer) { for (@keys) { $sum += $foo->{$inner}{$_}{value}; } } push @{$bar->{$sum}}, $outer; } print ' kens_code '; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; } [download] The array, with two million elements, took about 30s (so it's roughly comparable to what the OP describes). My code was typically shaving around 25-30% off of this. I ran it quite a few times — here's a fairly representative run. ` op_code * 1275 1999999 v-w-x-y-z op_code: 31.985272 * kens_code * 1275 1999999 v-w-x-y-z kens_code: 23.075756 * Compare * kens/op: 72.144941%` [download] By the way, I totally agree with your comments re `$a` and `$b`: I only use those as special variables. I'm not completely averse to single-letter variable names, such as `$i` for a loop index; although, I do cringe when I find them liberally scattered through production code — meaningful names are a much better choice. — Ken	[reply] [d/l] [select]
Re^3: Optimize a perl block by Eily (Monsignor) on Sep 07, 2017 at 08:35 UTC
Hi kcott, thanks for your answer :). I also tried inlining the results (now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"): that proved to be slower than using "@keys". The () prototype makes a function a candidate for inlining, but only if the function doesn't have side effects and returns a constant value. keys can both have side effects (when working on a tied hash), and isn't guaranteed to return a constant value so your function couldn't be inlined. I didn't think of caching. That's a good idea, and might put "sum(0) map" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown). Well I used your code to try, and the runtime is divided by ten. But that's because all the keys in the outer hash are identical, so the the sum is only computed once, and the cache is used everytime after that. I doubt the OP's data is that redundant :). Here's my code BTW: `sub eilys_code { my $bar; my %cache; my @keys = keys %$hash_ref; for my $outer (@$array_ref) { my $sum; for my $inner (split /-/, $outer) { $sum = $cache{$inner} //= sum map { $_->{value} } @{ $foo->{ +$inner} }{@keys}; } push @{$bar->{$sum}}, $outer; } print '* kens_code *'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; }` [download] The fact that with this sample data all the sums are identical makes this test less representative though, because `push @{$bar->{$sum}}, $outer;` autovivifies an array only once. I'm not completely averse to single-letter variable names, such as $i for a loop index I think $x, $y and $z as coordinates are fine. I still try to avoid $i though, because most of the time you don't need a loop index in perl, so if one is needed, there's probably a meaningful name to explain why.	[reply] [d/l] [select]
Re^4: Optimize a perl block by kcott (Archbishop) on Sep 07, 2017 at 23:26 UTC
Re^5: Optimize a perl block by Eily (Monsignor) on Sep 08, 2017 at 08:47 UTC
Re: Optimize a perl block by roboticus (Chancellor) on Sep 06, 2017 at 14:55 UTC
TruP: Is this loop executed once, or are you doing it repeatedly during your program? If the latter, it might be that you can factor the code to move some of the processing out of a tight inner loop. If you can show a bit more of your program (with a little sample data) people might be able to dig in a little better to find some optimization opportunities. Suppose, for example, that your loop was the scoring section of a genetic algorithm and you were recomputing the score once per each generation. If that were the case, and you were changing the genes of only half your population at a time, you could perhaps retain the scores after you compute them, and then skip the calculation for any item that you've already computed a score for: `my %scored; . . . foreach my $a (@{ $array_ref }) { if (exists $scored{$a}) { # We've already scored this one push @{ $bar->{$scored{$a}}, $a; next; } my $i = 0; foreach my $b (split('-', $a)) { foreach my $c (keys %{ $hash_ref }) { $i += $foo->{$b}->{$c}->{'value'}; } } $scored{$a} = $i; push @{ $bar->{$i} }, $a; }` [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re: Optimize a perl block by llavaud (Novice) on Sep 08, 2017 at 08:15 UTC
Hello all, I originally post this question on stackexchange the 1st of September (link here), it seems IruP "stole" my question and post-it here and so it was closed on stackexchange. Well it is not very serious. To return to the subject; after initial advise on stackexchange i have already replaced `keys %{ $hash_ref }` by `my @keys = keys %$hash_ref;`, after this modification it runs about 20% more quickly. Now i just tested the Eily tricks with `$i += $cache{$b_} //= sum map { $_->{value} } @{ $foo->{$b_} }{@keys};` and it is just really impressive, the gain is again almost 60% ! :) To be honest i did not understand anything on what is going on and the condensed form of the code do not help me but i will look at your links and try to understand. Meanwhile if you have time to give me a "simple" explanation of how your code works i will take it ! ;) More informations about data: * array_ref is about 2 millions entries, keys are unique * split of $a give 5 elements * hash_ref is small, about 10 entries and keys are unique too	[reply] [d/l] [select]
Re^2: Optimize a perl block by llavaud (Novice) on Sep 08, 2017 at 09:11 UTC
ok i think i got it, finally it is not so hard to understand, i am just not familiar with the "slice" form `@{ $foo->{$b_} }{@keys}`. thanks for your help !	[reply] [d/l]
Re^3: Optimize a perl block by Eily (Monsignor) on Sep 08, 2017 at 09:31 UTC
Hi llavaud, welcome to the monastery. I'm glad that you got it, but feel free to ask if you need more details.	[reply]
Re^4: Optimize a perl block by llavaud (Novice) on Sep 08, 2017 at 10:08 UTC
Re^5: Optimize a perl block by Eily (Monsignor) on Sep 08, 2017 at 10:31 UTC
Some notes below your chosen depth have not been shown here
Re: Optimize a perl block by Anonymous Monk on Sep 06, 2017 at 14:20 UTC
How big is `%$foo`? Does `$foo->{$b}{$c}{value}` always exist, or are you counting on the `undef` getting converted to a zero? Is `$hash_ref` really `$foo->{$b}`?	[reply]
Re^2: Optimize a perl block by llavaud (Novice) on Sep 08, 2017 at 08:20 UTC
`%$foo` is relatively small, about 100 keys and `$foo->{$b}{$c}{value}` always exist yes. I don't understand what you mean by: Is `$hash_ref` really `$foo->{$b}` ?	[reply] [d/l] [select]


Don't ask to ask, just ask
	PerlMonks