Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Minimum Hash keys allocated?

by flexvault (Monsignor)
on Sep 26, 2010 at 17:51 UTC ( #862100=perlquestion: print w/replies, xml ) Need Help??
flexvault has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

When I create a %hash, I always presize it with:

my %hash = (); keys(%hash) = $nnn;

Question is, what is the minimum or default number of keys that are allocated by perl? Currently, I usually assign $nnn when I think the hash will grow to more than 30, but if the minimum was higher, I could save some key-strokes and use less instructions.

Thank you.

Replies are listed 'Best First'.
Re: Minimum Hash keys allocated?
by BrowserUk (Pope) on Sep 26, 2010 at 18:25 UTC

    The first time you assign a value to a new hash, it gets allocated 8 slots. When the number of slots used reaches 6 (which may be more than 6 keys if you get hash collisions), it expands to 16 slots. And it doubles again when 3/4s of the slots (12/16) are used. And so on.

    You can see this if you run the following code:

    perl -wE"my %h;$h{$_}=$_, say $_, ':',scalar %h for 1 .. 1025" | more 1:1/8 2:2/8 3:3/8 4:3/8 ## collision 5:4/8 6:5/8 7:6/8 8:7/16 9:8/16 10:9/16 11:10/16 12:10/16 ## collision 13:10/16 ## collision 14:11/16 15:11/16 ## collision 16:14/32 17:14/32 ## collision ...

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      WOW . . .

      That was even a better answer than I expected!

      For the interactive application I've written/using, pre-allocating the hashes, has improvements of 8-24% using hashes with 256 to 2048 keys on AIX and even better on Linux. But I didn't know about how the algorithm worked on smaller hashes. With your great and simple script, I can test larger combinations as well.

      Thank you very much...Ed

      PS: Is the algorithm the same on perl 5.8.8?
        PS: Is the algorithm the same on perl 5.8.8?

        To my knowledge, it hasn't changed since before 5.6 days. Possibly never, but that's as far as I go back with Perl.

Re: Minimum Hash keys allocated?
by afoken (Abbot) on Sep 26, 2010 at 18:20 UTC

    Premature optimization is the root of all evil. Just don't do that. Perl will handle allocation good enough most of the times. There may be rare cases where preallocation may be useful, but then, you would use Devel::NYTProf or similar first to find and eliminate other bottlenecks.

    To save keystrokes, use my %hash=(); or just my %hash;, both are technically equal. Some people (including me) prefer the first form, refusing to rely on the implementation details of Perl to initialize the variable with an empty hash / empty array / undefined scalar, other people think that the documentation is very clear on this fact and the initializing assignment is just too much and redundant work.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      In fact, pre-allocating can save some headaches if you know the data structure is going to be a very large one, qv:

      mem usage

      Pay close attention to what ikegami says in this part:

      Re^4: mem usage

      As this sort of hack made a very real difference -- it is not simply a matter of setting the number of slots.

Re: Minimum Hash keys allocated?
by cdarke (Prior) on Sep 27, 2010 at 10:10 UTC
    You need a huge number of keys to make pre-allocation worth it. Running this several times:
    use warnings; use strict; use Benchmark; my $start = new Benchmark; my $size = 10000000; my %hash; keys(%hash) = $size; for my $i (1..$size) { my $key = "key$i"; $hash{$key} = $key; } my $end = new Benchmark; my $diff = timediff ($end, $start); print "The code ran for \n",timestr($diff),"\n";
    With 10,000,000 keys, typical numbers for pre-allocated hash keys was:
    19 wallclock secs (17.36 usr + 0.94 sys = 18.30 CPU)
    Typical numbers with the pre-allocation commented out:
    21 wallclock secs (21.23 usr + 0.63 sys = 21.86 CPU)

    I had to use 100,000 keys to even measure a difference, which was typically 0.03 seconds.

    It ain't worth it.

      I can't show a simple script to defend my earlier claim, since the times were taken using Time::HiRes qw(gettimeofday).

      The sub-system used is 20,000+ lines of code running persistently with pre-forked servers that dynamically expand/contract on activity. The servers respond to tcp calls from a web server with a very small mod-perl cgi script preloaded.

      The design goal was to support 3 transactions per second per core with enough memory to prevent paging.

      One bottleneck was to reload all hashes required for each transaction. Testing found that pre-allocting the hash key number had a significant improvement in production.

      The last stress test, we exceeded 3.52 tranactions per second per core. However, the difference between hardware, operating systems, web-servers, mod-perl, and versions of perl is significant, so your mileage may vary!

      Thank you

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://862100]
Approved by varian
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2018-05-26 22:21 GMT
Find Nodes?
    Voting Booth?