Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Optimize a perl block

by Eily (Monsignor)
on Sep 06, 2017 at 10:25 UTC ( [id://1198763]=note: print w/replies, xml ) Need Help??


in reply to Optimize a perl block

One thing I see is that $hash_ref seems to be constant, so you can compute the list of keys once instead of each time: @keys = keys %$hash_ref; and for my $key (@keys)

And if several $a values can share $b sub values, you can keep the subsums in cache to avoid recomputing them (untested because you didn't provide sample data):

use List::Util qw( sum ); my %cache; my @keys = keys %$hash_ref; foreach my $a_ (@{ $array_ref }) { my $i = 0; foreach my $b_ (split('-', $a_)) { $i += $cache{$b_} //= sum map { $_->{value} } @{ $foo->{$b_} } +{@keys}; # Use the cached value or compute the sum } push @{ $bar->{$i} }, $a_; }
Using the orcish manoeuver, List::Util, and a slice

I suppose the name are only because this is an example? Even in a example I would avoid $a and $b, (even more so when using List::Util) because of their special meaning. (Edit) And I avoid single char names as a general rule

Edit $hash_ref->{$b_ } should have been $foo->{$b_}. Thanks kcott

Replies are listed 'Best First'.
Re^2: Optimize a perl block
by kcott (Archbishop) on Sep 07, 2017 at 01:53 UTC

    G'day Eily,

    ++ I tried a few things independently; however, it appears that much of that is very similar to what you've done, so I'll post it here for comparison.

    "my @keys = keys %$hash_ref;" was one of my first thoughts and this appeared to be a definite winner. I also tried inlining the results (now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"): that proved to be slower than using "@keys".

    I had code very similar to your "sum map ..."; although, I used sum0. That appeared to be slower (even with the "map EXPR" form I used); I suspect any gains from sum were overshadowed by losses from map; I didn't investigate that any further.

    I didn't think of caching. That's a good idea, and might put "sum(0) map" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown).

    I dummied up some test data (based on the OP's description but, I'm sure, far from representative); ran some basic timings; and included some sanity checking. Here's the code I found to be fastest.

    #!/usr/bin/env perl -l use strict; use warnings; use Time::HiRes 'time'; my $hash_ref; @$hash_ref{'a' .. 'j'} = 1 .. 10; my $array_ref = [ ('v-w-x-y-z') x 2e6 ]; my $foo; my $value = 0; for my $outer ('v' .. 'z') { $foo->{$outer}{$_}{value} = ++$value for 'a' .. 'j'; } my $t0 = time; op_code(); my $t1 = time; printf "op_code: %.6f\n", $t1 - $t0; kens_code(); my $t2 = time; printf "kens_code: %.6f\n", $t2 - $t1; print '*** Compare ***'; printf "kens/op: %.6f%%\n", (($t2 - $t1) / ($t1 - $t0)) * 100; sub op_code { my $bar; foreach my $a (@{ $array_ref }) { my $i = 0; foreach my $b (split('-', $a)) { foreach my $c (keys %{ $hash_ref }) { $i += $foo->{$b}->{$c}->{'value'}; } } push @{ $bar->{$i} }, $a; } print '*** op_code ***'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; } sub kens_code { my $bar; my @keys = keys %$hash_ref; for my $outer (@$array_ref) { my $sum; for my $inner (split /-/, $outer) { for (@keys) { $sum += $foo->{$inner}{$_}{value}; } } push @{$bar->{$sum}}, $outer; } print '*** kens_code ***'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; }

    The array, with two million elements, took about 30s (so it's roughly comparable to what the OP describes). My code was typically shaving around 25-30% off of this. I ran it quite a few times — here's a fairly representative run.

    *** op_code *** 1275 1999999 v-w-x-y-z op_code: 31.985272 *** kens_code *** 1275 1999999 v-w-x-y-z kens_code: 23.075756 *** Compare *** kens/op: 72.144941%

    By the way, I totally agree with your comments re $a and $b: I only use those as special variables. I'm not completely averse to single-letter variable names, such as $i for a loop index; although, I do cringe when I find them liberally scattered through production code — meaningful names are a much better choice.

    — Ken

      Hi kcott, thanks for your answer :).

      I also tried inlining the results (now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"): that proved to be slower than using "@keys".
      The () prototype makes a function a candidate for inlining, but only if the function doesn't have side effects and returns a constant value. keys can both have side effects (when working on a tied hash), and isn't guaranteed to return a constant value so your function couldn't be inlined.

      I didn't think of caching. That's a good idea, and might put "sum(0) map" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown).
      Well I used your code to try, and the runtime is divided by ten. But that's because all the keys in the outer hash are identical, so the the sum is only computed once, and the cache is used everytime after that. I doubt the OP's data is that redundant :). Here's my code BTW:
      sub eilys_code { my $bar; my %cache; my @keys = keys %$hash_ref; for my $outer (@$array_ref) { my $sum; for my $inner (split /-/, $outer) { $sum = $cache{$inner} //= sum map { $_->{value} } @{ $foo->{ +$inner} }{@keys}; } push @{$bar->{$sum}}, $outer; } print '*** kens_code ***'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; }

      The fact that with this sample data all the sums are identical makes this test less representative though, because push @{$bar->{$sum}}, $outer; autovivifies an array only once.

      I'm not completely averse to single-letter variable names, such as $i for a loop index
      I think $x, $y and $z as coordinates are fine. I still try to avoid $i though, because most of the time you don't need a loop index in perl, so if one is needed, there's probably a meaningful name to explain why.

        "... keys ... isn't guaranteed to return a constant value so your function couldn't be inlined."

        I disagree with that in part. I did take into consideration this from the keys doco:

        "Hash entries are returned in an apparently random order."

        Along with this, a couple of sentences later:

        "So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other. "

        A very quick and dirty test (which I ran a number of times) with

        $ perl -e 'my %x = qw{@ 1 | 2}; for (1..30) { print keys %x }; print " +\n"'

        always returned one of these two:

        |@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@ @|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|@|

        So, that's a reasonable confirmation of the doco, and the fact that my Perl version (5.26.0, by the way) is adhering to this.

        Anyway, I went back to the code I posted earlier, and added the subroutine requesting inlining back in; however, this time I wrote it like this, so that @keys was constant and couldn't possibly be altered elsewhere due to the lexical scoping of the anonymous block:

        { my @keys = keys %$hash_ref; sub hrkeys () { @keys } }

        There was no improvement on the OP's code. I ran it a few times, the "kens/op:" value was now showing 100% (±2%).

        With regard to the caching, that's the reason I didn't spend any time trying to check it. The minimal sanity checking (that the OP's and my code were producing equivalent results) showed %$bar had one key ("1275") and its arrayref value held all of the two million elements.

        — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1198763]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-20 03:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found