Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Ref to hash entries are faster?

by Wiggins (Hermit)
on Nov 29, 2006 at 19:53 UTC ( [id://586809]=perlquestion: print w/replies, xml ) Need Help??

Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

Old habits dye hard, or at least dark grey.

Coming from a 'C'/C++ background, I hate to constantly access an array element in a loop using the array subscripting. I prefer a pointer to the entry after computing subscripts once.

With a hash, is it better to take a reference to the the keyed scalar, or repeatedly recompute the hash?

A snippit of a 60 line loop

ELEMENT: foreach $key (keys %Contracts) { $tot_count++; {# lock block lock @{$Contracts{$key}}; next ELEMENT if ($Contracts{$key}[STATE] eq $ST_VOID); next ELEMENT if ($Contracts{$key}[STATE] eq $ST_REPORTED +); $elapsed_time = time() - $Contracts{$key}[START]; .... lots more accesses to the $Contracts{$key} record.
Does 'locality or reference' optimized that hash access, or is the key rehashed on every use in the loop?


ELEMENT: foreach $key (keys %Contracts) { $tot_count++; {# lock block lock @{$Contracts{$key}}; $Cntrct_ref = \$Contracts{$key}; next ELEMENT if ($Cntrct_ref[STATE] eq $ST_VOID); next ELEMENT if ($Cntrct_ref[STATE] eq $ST_REPORTED); $elapsed_time = time() - $Cntrct_ref[START]; ...
Wisdom is needed, please

Replies are listed 'Best First'.
Re: Ref to hash entries are faster?
by Tanktalus (Canon) on Nov 29, 2006 at 20:29 UTC

    Methinks you're overly worried about optimising things ;-) So much so that you're making small typos that will cause you no end of headache (well, it'll end if you're using strict... but still be headaches while you sort it out).

    ELEMENT: foreach my $key (keys %Contracts) { $tot_count++; # hmmm... { # lock block lock @{$Contracts{$key}}; my $Cntrct_ref = $Contracts{$key}; # note: no backslash next ELEMENT if $Cntrct_ref->[STATE] eq $ST_VOID or $Cntrct_ref->[STATE] eq $ST_ +REPORTED; $elapsed_time = time() - $Cntrct_ref->[START]; ...
    Note the arrows. Much like C/C++'s pointers. Now, as to speed - I think that your first example will outperform the second as long as accuracy is also important. ;-) And that brings me to my point: don't fret about minor details like this. It's much easier to make correct code fast than fast code correct. So, I'd say that you should just get it working, and, if it's not fast enough, profile it and then see where it can be sped up.

    You're worried about hash-lookup times in a byte-code language. That just doesn't seem to be a productive use of worrying time to me ;-)

      It's a tough one to let go. Before Perl, most of my coding was C and ASM in a day when 66mhz was cutting edge. The urge to overly optimize was great :(

      "To be civilized is to deny one's nature."
Re: Ref to hash entries are faster?
by GrandFather (Saint) on Nov 29, 2006 at 21:12 UTC

    If you wish to ignore Tanktalus's good advice and sweat the small stuff then a benchmark gives an answer. Normal benchmark caveats apply of course - like the benchmark may not indicate anything at all about your particular situation.


    Rate lookup refit lookup 443125/s -- -32% refit 648621/s 46% --

    Note that the result is somewhat sensitive to the number of key/value pairs in the hash. 46% is about the largest difference I've seen.

    Note too that at about 200 nano-seconds per access this tweak is unlikely to make much difference to most applications. Code clarity would be a much more important consideration!

    DWIM is Perl's answer to Gödel
      Not a fair comparison. You're creating an entirely new SV every time, whereas the OP was using an existing ref.
      my $r = $Contracts{$key}; $r->[STATE]
      is not similar to
      my $r = \$Contracts{$key}; $$r


      My benchmark:

      use strict; use warnings; use Benchmark qw( cmpthese ); use constant NUM_KEYS => $ARGV[0]; use constant NUM_ACCESSES => $ARGV[1]; use constant TEST_TIME => $ARGV[2]; our %contracts = map { $_ => [ 0 ] } 1 .. NUM_KEYS; my $lookup = ' foreach (keys %contracts) { _____ } '; $lookup =~ s/_____/'$a = $contracts{$_}[0];' x NUM_ACCESSES/e; my $refit = ' foreach (keys %contracts) { my $r = $contracts{$_}; _____ } '; $refit =~ s/_____/'$a = $r->[0];' x NUM_ACCESSES/e; cmpthese(TEST_TIME, { lookup => $lookup, refit => $refit, });
      My results:
      >perl 500 4 -5 Rate lookup refit lookup 1060/s -- -4% refit 1101/s 4% -- >perl 500 4 -5 Rate lookup refit lookup 1053/s -- -5% refit 1105/s 5% -- >perl 500 6 -5 Rate lookup refit lookup 772/s -- -11% refit 869/s 13% -- >perl 500 6 -5 Rate lookup refit lookup 775/s -- -11% refit 873/s 13% -- >perl 500 10 -5 Rate lookup refit lookup 496/s -- -19% refit 614/s 24% -- >perl 500 10 -5 Rate lookup refit lookup 499/s -- -19% refit 613/s 23% --

      Verdict: It's a fruitful optimization, but it's not ground breaking.

        Your benchmark is including the loop overhead - which is appropriate for "real" code, but dilutes the difference between the two access methods. Altering the appropriate parts of your code to:

        my $lookup = ' _____ '; $lookup =~ s/_____/'$a = $contracts{1}[0];' x NUM_ACCESSES/e; my $refit = ' my $r = $contracts{1}; _____ '; $refit =~ s/_____/'$a = $r->[0];' x NUM_ACCESSES/e;

        and using your last test set I get:

        Rate lookup refit lookup 245666/s -- -32% refit 360597/s 47% --

        which is double the difference shown with the loop overhead included, and is comparable with the result from my benchmark.

        Ain't benchmarks fun, almost as good as statictics! :-D

        DWIM is Perl's answer to Gödel
      This is the real answer. 46% faster. 200 nanos is 200 more than 0.

      I am doing some network monitoring on a 100Mb network. Sharing CPU with the real application (customer too cheap for a dedicated monitoring box). At least 2 threads locking either the total hash, or an element of the hash. I have to minimize the lockout periods to maximize the chance of keeping up with the data stream.

      Also learned about the benchmark module. Thanks

      As to the spelling, This is modified code; I hadn't tried the reference to element itself, just winged it for the question.

      Thanks to all....

        Hmmm. Consider the following:

        use strict; use warnings; use Benchmark 'cmpthese'; use constant MAX => 520; my %contracts = ( 1 .. 2 * MAX ); cmpthese ( -1, { lookup => \&lookup, refit => \&refit, } ); sub lookup { for ( keys %contracts ) { return if $contracts{$_} == MAX; } } sub refit { for ( keys %contracts ) { my $ref = \$contracts{$_}; return if $$ref == MAX; } }

        Output (on my machine):

        Rate refit lookup refit 4263/s -- -21% lookup 5407/s 27% --
Re: Ref to hash entries are faster?
by shotgunefx (Parson) on Nov 29, 2006 at 20:18 UTC
    The Cost of Nested Referencing

    It is slower, the optimizer is not that smart, but I wouldn't worry about it unless the loop in questioin really, really needs to be optimized.

    "To be civilized is to deny one's nature."
Re: Ref to hash entries are faster?
by fenLisesi (Priest) on Nov 30, 2006 at 08:05 UTC
    I was going to say:
    If STATE, START etc are implemented as subs, the choice of an array at that level could be expensive. How about a hash there, too, as in $Contract_href->{STATE}? (I have not benchmarked this, just a guess).
    but then I did run a (perhaps flawed) benchmark and the array with use constant seems to win.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://586809]
Approved by imp
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-16 11:59 GMT
Find Nodes?
    Voting Booth?

    No recent polls found