Here's how a hash works, in a nutshell... You've got a list of buckets, each of which is a list of zero or more items. You also have a hash function that converts a key into a "hash" value used to look up a bucket. (For any given key, the hash function must always return the same result; otherwise you wouldn't be able to find your items again!)
When you want to do something with a specific key, you call the hash function for that key, use the result to find the appropriate bucket, and then search all the items in that bucket for the one you want. So, instead of having to check against every item in the hash, you only have to check against a few items that happen to be in the same bucket.
As you keep adding to the hash, each bucket has to hold more items. As the buckets get bigger, the lookups get slower, because you have to search through more items. That problem is solved by dynamically increasing the number of buckets and redistributing the items as the hash grows in size.
Your hash has 64 buckets (Perl always uses a power of 2), although only 42 are being used to store the 53 items. That just means that, with that combination of hash function, number of buckets, and keys, a few of the keys are sharing buckets. | [reply] |
| [reply] |
I can do this one :o)
From perldoc perlfunc:
As an lvalue `keys' allows you to increase the
number of hash buckets allocated for the given
hash. This can gain you a measure of efficiency
if you know the hash is going to get big. (This
is similar to pre-extending an array by assigning
a larger number to $#array.) If you say
keys %hash = 200;
then `%hash' will have at least 200 buckets
allocated for it--256 of them, in fact, since it
rounds up to the next power of two. [...]
| [reply] [d/l] [select] |
As pileswasp pointed out, you can preallocate your hash by assigning to keys, similar to preallocating an array by assigning to $#array.
Most of the time, of course, you'll just let Perl worry about the number of buckets. Here's how that works...
When you add a new key to the hash, perl checks whether there are too many keys for the number of buckets. If that's the case, perl allocates a bunch more buckets. Then, perl goes through the whole hash, calls the hash function for each key, and figures out where it belongs in the newly-resized hash.
You're probably thinking that resizing the hash and reinserting all the elements is slow; you're absolutely right. When perl has to resize a hash, it doubles the number of buckets, which keeps it from having to resize your hashes too often.
| [reply] |