I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:

If the approximate size of the data set is known, preallocating the hash improves performance.

keys my %cache = @in; $cache{$_} = KEY($_) for @in;

The following sets up the cache more efficiently, using a hash slice:

keys my %cache = @in; @cache{@in} = map KEY($_) => @in;

I liked the idiom

keys %h = @a; @h{ @a } = ... ; # do something useful

and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:

Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:

use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { cmpthese( -5, { 1 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; $h{ $_ } = log for @a; return \%h }, 2 => sub { my @a = map { log } 2 .. $count; my %h; $h{ $_ } = log for @a; return \%h }, 3 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; @h{ @a } = map { log } @a; return \%h }, 4 => sub { my @a = map { log } 2 .. $count; my %h; @h{ @a } = map { log } @a; return \%h }, }) }

log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:

Rate 4 3 2 1 4 1507/s -- -3% -4% -5% 3 1549/s 3% -- -2% -3% 2 1576/s 5% 2% -- -1% 1 1593/s 6% 3% 1% -- Rate 1 2 4 3 1 140/s -- -3% -4% -4% 2 145/s 3% -- -0% -1% 4 145/s 4% 0% -- -0% 3 146/s 4% 1% 0% -- Rate 4 2 3 1 4 12.1/s -- -7% -8% -9% 2 12.9/s 7% -- -2% -3% 3 13.1/s 9% 2% -- -1% 1 13.3/s 10% 3% 1% -- s/iter 4 1 2 3 4 1.39 -- -3% -3% -3% 1 1.35 3% -- -0% -0% 2 1.35 3% 0% -- -0% 3 1.35 3% 0% 0% --

No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?


In reply to Does "preallocating hash improve performance"? Or "using a hash slice"? by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":