Perl Monk, Perl Meditation  
PerlMonks 
Re: A short meditation about hash search performanceby AbigailII (Bishop) 
on Nov 17, 2003 at 09:34 UTC ( #307611=note: print w/replies, xml )  Need Help?? 
Yes, you are right that length of a key plays a role in
calculating the hash value. And it plays a role in comparing
two keys as well. You can take this time into account, and
say insertion/searching takes O (k), where k is the length of
input. There's no relationship between k and n though, and
usually we aren't interested in this factor. We just define
that calculating a hash value can be done in constant time,
and so can comparing two keys.
As for the doubling, this factors out (assuming it isn't possible to construct a set of keys such that even after repeated doubling, they keep hashing to the same values). The sketch of the proof is as follows: suppose we rebuild the hash after N inserts; that is, the hash was rebuild when it contains N keys. Building a hash out of N keys will take O (N) time  this is O (1) per key. Now you also have to show that the next rebuild isn't taking place after inserting another c * N keys, for some constant c > 1. This means that if you rebuild a hash with N keys, for at least N / (1 + c) keys, this is the first time they are involved, for at least N / (1 + c)^2 keys, this is the second time they are involved in a rebuild, etc. If you do the math, you will see that there are some keys that have been charged O (log N) on rebuilds, but because there are so many more who have been charged less, it works out to O (1) amortized time. So, yes, a single insert can take O (1) time, but, starting from an empty hash, N inserts take O (N) time. Rebuilding after a bunch of inserts is actually a wellknown technique for datastructures. Often a datastructure is only partially rebuild (giving the technique its name: "partially rebuilding"). Rebuilding the entire datastructure is just an extreme variant. Abigail
In Section
Meditations

