Yes, you are right that length of a key plays a role in
calculating the hash value. And it plays a role in comparing
two keys as well. You can take this time into account, and
say insertion/searching takes O (k), where k is the length of
input. There's no relationship between k and n though, and
usually we aren't interested in this factor. We just define
that calculating a hash value can be done in constant time,
and so can comparing two keys.
As for the doubling, this factors out (assuming it isn't possible to construct a set of keys such that even after
repeated doubling, they keep hashing to the same values).
The sketch of the proof is as follows: suppose we rebuild
the hash after N inserts; that is, the hash was rebuild
when it contains N keys. Building a hash out of N keys will
take O (N) time - this is O (1) per key. Now you also have
to show that the next rebuild isn't taking place after
inserting another c * N keys, for some constant c > 1. This
means that if you rebuild a hash with N keys, for at least
N / (1 + c) keys, this is the first time they are involved,
for at least N / (1 + c)^2 keys, this is the second time
they are involved in a rebuild, etc. If you do the math,
you will see that there are some keys that have been charged
O (log N) on rebuilds, but because there are so many more
who have been charged less, it works out to O (1) amortized time. So, yes, a single insert can take
O (1) time, but, starting from an empty hash, N inserts
take O (N) time.
Rebuilding after a bunch of inserts is actually a well-known
technique for datastructures. Often a datastructure is only
partially rebuild (giving the technique its name: "partially
rebuilding"). Rebuilding the entire datastructure is just
an extreme variant.
Abigail | [reply] |