Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: how are the number of buckets in a perl hash chosen

by david2008 (Scribe)
on Jun 09, 2014 at 08:31 UTC ( #1089239=note: print w/ replies, xml ) Need Help??


in reply to Re: how are the number of buckets in a perl hash chosen
in thread how are the number of buckets in a perl hash chosen

If i did understand exactly, from your example, that the power of 2 should be a prime.
I tried it on different examples and saw that this is not correct every time.

perl -E "%c = (3=>5);keys(%c) = 33;say scalar(%c)" 1/64
Maybe i did not understand it 100%.
I looked at perldata and keys and saw indeed that the number of buckets is a power of 2 but it was not explained why.


Comment on Re^2: how are the number of buckets in a perl hash chosen
Download Code
Replies are listed 'Best First'.
Re^3: how are the number of buckets in a perl hash chosen
by Anonymous Monk on Jun 09, 2014 at 15:17 UTC

    If i did understand exactly, from your example, that the power of 2 should be a prime.

    I don't know, wolfram says "prime factorization"...

    I looked at perldata and keys and saw indeed that the number of buckets is a power of 2 but it was not explained why.

    Oh, why? because :) its coded that way :D

    newsize &= ~(newsize & (1 + ~newsize)); /* get proper power of 2 */ .. Newxz(a, PERL_HV_ARRAY_ALLOC_BYTES(newsize), char);

    Programmers just love to do things in powers of two

    perl hash presize why power of 2 How Hashes Really Work - Perl.com

    In principle, a hashing function returns an array index directly; in practice, it is common to use its (arbitrary) return value modulo the number of buckets as the actual index. (Using a prime number of buckets that is not too close to a power of two tends to produce a sufficiently uniform key distribution.)
    A Hash Function for Hash Table Lookup
    The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!).
    Hash table
    In the case that the array size is a power of two, the remainder operation is reduced to masking, which improves speed, but can increase problems with a poor hash function.

    So there you have it, power of two, its for the speed

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1089239]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (15)
As of 2015-07-07 19:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls