Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Does "preallocating hash improve performance"? Or "using a hash slice"?

by vr (Monk)
on Feb 15, 2017 at 19:39 UTC ( #1182092=perlquestion: print w/replies, xml ) Need Help??
vr has asked for the wisdom of the Perl Monks concerning the following question:

I was reading this article: http://www.sysarch.com/Perl/sort_paper.html (actually, mentioned here: Re: Sorting geometric coordinates based on priority), and found this:

If the approximate size of the data set is known, preallocating the hash improves performance.

keys my %cache = @in; $cache{$_} = KEY($_) for @in;

The following sets up the cache more efficiently, using a hash slice:

keys my %cache = @in; @cache{@in} = map KEY($_) => @in;

I liked the idiom

keys %h = @a; @h{ @a } = ... ; # do something useful

and thought it would be nice to remember and use it sometimes. While, of course, I was using hash slices before, but rather because they look so concise and, somehow because of this, I felt that code, as a result, must, indeed, be more efficient. And now additional optimization through "magical" use of keys as lvalue, forcing scalar context on array. Actually, keys mentions this optimization, but I missed it before:

Used as an lvalue, keys allows you to increase the number of hash buckets allocated for the given hash. This can gain you a measure of efficiency if you know the hash is going to get big.

Then I thought it strange that assigning to large hash slice still requires this "preallocation". Then I ran this test:

use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { cmpthese( -5, { 1 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; $h{ $_ } = log for @a; return \%h }, 2 => sub { my @a = map { log } 2 .. $count; my %h; $h{ $_ } = log for @a; return \%h }, 3 => sub { my @a = map { log } 2 .. $count; my %h; keys %h = @a; @h{ @a } = map { log } @a; return \%h }, 4 => sub { my @a = map { log } 2 .. $count; my %h; @h{ @a } = map { log } @a; return \%h }, }) }

log is here to imitate at least some payload (useful work), and to create longer hash keys (if it matters). Returning a reference so that Perl doesn't sniff we don't need this hash and won't skip any work. And that's because of results:

Rate 4 3 2 1 4 1507/s -- -3% -4% -5% 3 1549/s 3% -- -2% -3% 2 1576/s 5% 2% -- -1% 1 1593/s 6% 3% 1% -- Rate 1 2 4 3 1 140/s -- -3% -4% -4% 2 145/s 3% -- -0% -1% 4 145/s 4% 0% -- -0% 3 146/s 4% 1% 0% -- Rate 4 2 3 1 4 12.1/s -- -7% -8% -9% 2 12.9/s 7% -- -2% -3% 3 13.1/s 9% 2% -- -1% 1 13.3/s 10% 3% 1% -- s/iter 4 1 2 3 4 1.39 -- -3% -3% -3% 1 1.35 3% -- -0% -0% 2 1.35 3% 0% -- -0% 3 1.35 3% 0% 0% --

No meaningful difference at all. So, are my tests flawed, or claims about efficiency of slices and preallocation don't hold any water?

Replies are listed 'Best First'.
Re: Does "preallocating hash improve performance"? Or "using a hash slice"?
by dave_the_m (Prior) on Feb 15, 2017 at 20:05 UTC
    So, are my tests flawed
    Kinda. Your benchmarks are dominated by the time taken to generate @a and 'map log @a'. If these are precomputed you can start to see a big improvement with slicing and a smaller improvement with pre-sizing.
    use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { my @a = map { log } 2 .. $count; my @b = map log @a; cmpthese( -5, { 1 => sub { my %h; keys %h = @a; $h{ $_ } = log for @a; return \%h }, 2 => sub { my %h; $h{ $_ } = log for @a; return \%h }, 3 => sub { my %h; keys %h = @a; @h{ @a } = @b; return \%h }, 4 => sub { my %h; @h{ @a } = @b; return \%h }, }) } $ perl5240o ~/tmp/x.pl Rate 2 1 4 3 2 14381/s -- -1% -25% -26% 1 14516/s 1% -- -24% -26% 4 19198/s 33% 32% -- -2% 3 19510/s 36% 34% 2% -- Rate 2 1 4 3 2 1427/s -- -4% -25% -29% 1 1480/s 4% -- -22% -26% 4 1895/s 33% 28% -- -5% 3 2004/s 40% 35% 6% -- Rate 2 1 4 3 2 131/s -- -6% -22% -29% 1 140/s 6% -- -17% -25% 4 168/s 28% 20% -- -9% 3 185/s 41% 33% 10% -- Rate 2 1 4 3 2 6.50/s -- -6% -12% -18% 1 6.94/s 7% -- -6% -12% 4 7.39/s 14% 6% -- -7% 3 7.91/s 22% 14% 7% -- $

    Dave.

      I see, thank you everyone for answers. With these changes I'm getting same 2-1-4-3 consistent (and expected) sequence, though with less than 10% total difference, on older computer. Hashes of 100_000 keys were largest in linked article, but I understand that influence of preallocating was not its subject.

      ++ for correcting the benchmark (I actually wouldn't have guessed that log could weight so much on the result).

      There are still two issues with your benchmark though. First, you forgot the comma in @b = map log @a;, which is parsed as @b = map(log(@a)) where an empty @b is obtained. I have no idea why this is not a syntax error though. Second, the for-loops variant do not use the precomputed values, but compute the logarithm on each iteration.

      With this benchmark:

      use strict; use warnings; use Benchmark qw/ cmpthese /;; for my $count ( 100, 1_000, 10_000, 100_000 ) { my @array = map { log } 2 .. $count; print "-" x 10, "\nWith $count elements\n"; cmpthese( -5, { 1 => sub { my %h; keys %h = @array; $h{ $_ } = $_ for @array; return \%h }, 2 => sub { my %h; $h{ $_ } = $_ for @array; return \%h }, 3 => sub { my %h; keys %h = @array; @h{ @array } = @array; return \%h }, 4 => sub { my %h; @h{ @array } = @array; return \%h }, }); } __DATA__ ---------- With 100 elements Rate 2 3 1 4 2 6956/s -- -2% -3% -5% 3 7103/s 2% -- -1% -3% 1 7162/s 3% 1% -- -2% 4 7321/s 5% 3% 2% -- ---------- With 1000 elements Rate 2 4 1 3 2 621/s -- -3% -4% -7% 4 638/s 3% -- -1% -4% 1 644/s 4% 1% -- -4% 3 668/s 7% 5% 4% -- ---------- With 10000 elements Rate 2 4 3 1 2 62.1/s -- -0% -3% -4% 4 62.3/s 0% -- -3% -3% 3 63.9/s 3% 3% -- -1% 1 64.5/s 4% 3% 1% -- ---------- With 100000 elements Rate 2 4 1 3 2 4.75/s -- -3% -6% -7% 4 4.89/s 3% -- -3% -4% 1 5.03/s 6% 3% -- -1% 3 5.08/s 7% 4% 1% --
      We can see that in this case, slicing is just a little faster than iterating. Your benchmark did answer vr's question though: slicing does not seem to include the preallocation optimization.

        slicing does not seem to include the preallocation optimization

        I don't know how you can say given that the test shows no benefit from pre-allocating[1].

        But the reason the test shows no benefit from pre-allocating because the test is still flawed.

        Lexical variables aren't freed when they go out of scope; they are kept around for use the next time the scope is entered. That means the hash is effectively pre-allocated for all tests![2].

        $ perl -MDevel::Peek -e' sub x { my %h; Dump(%h, 0); keys(%h) = 100; Dump(%h, 0); } x() for 1..3; ' 2>&1 | grep MAX MAX = 7 MAX = 127 MAX = 127 <-- Preallocated even before C<< keys(%h) = 100; >>! MAX = 127 MAX = 127 <-- Preallocated even before C<< keys(%h) = 100; >>! MAX = 127

        Adding undef %h; should provide better results.

        $ perl -MDevel::Peek -e' sub x { my %h; Dump(%h, 0); keys(%h) = 100; Dump(%h, 0); undef %h; } x() for 1..3; ' 2>&1 | grep MAX MAX = 7 MAX = 127 MAX = 7 MAX = 127 MAX = 7 MAX = 127

        1. The number are far too small to be meaningful, and one would expect the difference to grow as the hash size increases. (1 vs 2: 3%, 4%, 4%, 6%; 3 vs 4: -3%, 5%, 3%, 4%)
        2. Well, except on the first pass of a given size of a given test.
Re: Does "preallocating hash improve performance"? Or "using a hash slice"?
by BrowserUk (Pope) on Feb 15, 2017 at 21:15 UTC

    If the hash is big enough it does make a difference:

    C:\test>p1 $t = time; my %h; keys %h = 10e6; $h{$_}=1 for 1 .. 10e6; printf "%.6f +\n", time() -$t;; 15.792654 $t = time; my %i; $i{$_}=1 for 1 .. 10e6; printf "%.6f\n", time() -$t; +; 18.772236

    That's about an 18% saving. And if you use hashes with 100s of millions of keys as I routinely do, then the saving get bigger.

    Perl's hashes start off with room for 8 keys and then double and double again as they grow. Each time it doubles, new memory is allocated and the keys must be rehashed and the values copied over. By preallocating a 10 million key hash that process is avoided 21 times.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
    In the absence of evidence, opinion is indistinguishable from prejudice.

      While this is of course true, you missed vr's point about hash slices. With @h{ @keys } = @values; perl could use the size of the keys list to preallocate the hash. I guess this would turn out to be a waste of space if most of the keys are duplicates, which would explain why apparently setting all the keys "at once" seems to still require several reallocations.

        you missed vr's point about hash slices.

        Not so much "missed it" as chose not to address it.

        With @h{ @keys } = @values; perl could use the size of the keys list to preallocate the hash

        The problem with that is that in order to gain the 18% saving from the preallocation of my 10,000,000 key hash, the hash slice would:

        1. Allocate stack space to accommodate 20,000,000 items on the stack.
        2. Turn the 10,000,000 element array, @keys, into a 10,000,000 item list on the stack.
        3. Turn the 10,000,000 element array, @values, into a 10,000,000 item list also the stack.
        4. Process those two lists in parallel adding each key/value pair individually; doubling (and redoubling), hashing (and rehashing) as required.
        5. Free the space required to accommodate the 20,000,000 items on the stack back to the memory pool.

        It's a lot more complicated than that, but it is sufficiently accurate to make my point.)

        So, now the process not only has two copies of all the keys and values (in the arrays as well as the hash), it also has allocated enough space to store them all again on the stack.

        So that means the peak memory usage required to do the slice assignment is at least 4 times more than is eventually required to construct the hash itself; and the hash was not preallocated either!

        Slice assignment can be more efficient for relatively small hashes provided it is necessary to the rest of the code to have the keys and values stored in arrays as well. But if you are loading them into arrays, simply so that you can use list assignment to construct a hash from them, it is a disaster in terms of both memory usage and processing time.

        And for anything larger then pretty modest sized hashes ( a few 10s or low 100s of key/value pairs) then the cost (memory and time) of building the two lists on the stack far outweights any gain from moving the loop into C. Especially as the potential of having the hash pre-sized doesn't happen. (It probably could, but it doesn't.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
        In the absence of evidence, opinion is indistinguishable from prejudice.
        .
Re: Does "preallocating hash improve performance"? Or "using a hash slice"?
by SuicideJunkie (Vicar) on Feb 15, 2017 at 20:11 UTC

    My first thought was: how many million keys did you try adding? I scrolled back up, and it was very few million keys indeed :)

    The other thing as noted above, is that if you don't see a difference between two things, you're probably just measuring overhead.

Re: Does "preallocating hash improve performance"? Or "using a hash slice"?
by Marshall (Abbot) on Feb 17, 2017 at 02:03 UTC
    I spent some effort with this idea of pre-allocation of Perl hashes. I thought it was cool and that it would do a lot. But I found out differently. In one application, I found that pre-sizing a 128K hash table made almost no significant difference at all versus letting the hash grow "naturally".

    The Perl hash function has changed over the years, but the low level C implementation appears to be "solid". The Intel integer multiply has gotten faster over the years and using shifts and addition versus multiply doesn't make as much difference as it used to. Also the low level Perl mem to mem copies appear to be "fast enough" - this more apparent with bigger data sizes to be copied.

    My conclusion: With less than 128K keys, don't worry about it unless there is some extreme requirement for this hash.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1182092]
Front-paged by GotToBTru
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2017-07-22 19:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I came, I saw, I ...
























    Results (340 votes). Check out past polls.