in reply to Re: Re: A short meditation about hash search performance in thread A short meditation about hash search performance
You obviously don't understand what O(1) means.
Let's see. The definition of big O is:
f(n) = O (g (n)) iff there are a M > 0 and a c > 0 such that
for all m > M, 0 <= f(m) <= c * g (m). [1] [
+2] [3]
I don't have any problem understanding with it. In layman terms, it
means that a function f of n is in the order of
g of n, if, and only if, there's a constant, such that
if n gets large enough, the value of f is at most
the value of g times said constant.
Search from beginning to end, going thru each element one by one, until
hit what you are searching for. In the worst case (the element is at the
end of the array), you have to hit 1 billion elements, but according to
you, that's O(1). I say it is O(n). We never put a restriction saying
that an array can at most contain 1 billion elements (so the size of
an array in general is not a constant, although it is a constant for a
given array at one given observation point.)
Hello? We never put a restriction on the size? Come again. What do you call:
And still O(1) is not reachable, unless each element resolve a unique key ;)
That's a restriction of 1. You started out by putting
restrictions on it, claiming that only if there's a restriction of a size
of 1, the search algorithm is O (1). I on the other hand pointed out that
as long as there is a restriction on the limit of the chain, it doesn't
matter what the restriction is, 1, 14 (for 5.8.2), or a billion. If there's
a restriction on the size, even with a linear search it's O (1). Here's
a proof:
Suppose the chain is limited to length K, where K is a constant, independent
of the amount of keys in the hash. Searching for a key is a two step
process: first we need to find the bucket the key hashes to, then we need
to find the key in the associated chain. Finding the right bucket takes
constant time. Traversing the chain takes at most K * e time, for some
constant e. So, searching for the element takes at most:
e * K + O (1), e >= 0
{definition of O()} <= e * K + d * 1, e >= 0, d >= 0
{arithmetic} == (e * K + d) * 1, e >= 0, d >= 0
{c == e * K + d} == c * 1
{c > 0} == O (1).
q.e.d.
I won't deny the performance will be rather lousy, but it's still
O (1). Which proves that bigOh doesn't say everything.
 [1]
 Cormen, Leiserson, and Rivest: Introduction to Algorithms.
MIT Press, 1990. pp 26.
 [2]
 Knuth: The Art of Computer Programming, Volume 1:
Fundamental Algorithms. Third Edition. AddisonWesley, 1997.
pp 107.
 [3]

 Sedgewick, and Flajolet: Analysis of Algorithms.
AddisonWesley, 1996. pp 4.
Abigail
Re: Re: A short meditation about hash search performance by demerphq (Chancellor) on Nov 17, 2003 at 08:58 UTC 
Abigail, I have two minor questions for you. First off you speak of finding the correct bucket as occurring in constant time. Given that the time to calculate the bucket value is dependent on the length of the key I dont quite see how this is correct. Or does this factor disappear because it averages to a constant time in normal use? I have a similar concern about the doubling of the buckets during insertion. My by now hazy recollection of big O() says that this behaviour is signifigant and should be included in the O() of hash insertion. Is this wrong? If its not wrong how would it be calculated? I havent the foggiest how you would calculate the effect of a factor that comes into play so rarely. Or is it again that it averages to 0 and so can be left out of the equation?

demerphq
_{
First they ignore you, then they laugh at you, then they fight you, then you win.
 Gandhi
}
 [reply] [d/l] 

Yes, you are right that length of a key plays a role in
calculating the hash value. And it plays a role in comparing
two keys as well. You can take this time into account, and
say insertion/searching takes O (k), where k is the length of
input. There's no relationship between k and n though, and
usually we aren't interested in this factor. We just define
that calculating a hash value can be done in constant time,
and so can comparing two keys.
As for the doubling, this factors out (assuming it isn't possible to construct a set of keys such that even after
repeated doubling, they keep hashing to the same values).
The sketch of the proof is as follows: suppose we rebuild
the hash after N inserts; that is, the hash was rebuild
when it contains N keys. Building a hash out of N keys will
take O (N) time  this is O (1) per key. Now you also have
to show that the next rebuild isn't taking place after
inserting another c * N keys, for some constant c > 1. This
means that if you rebuild a hash with N keys, for at least
N / (1 + c) keys, this is the first time they are involved,
for at least N / (1 + c)^2 keys, this is the second time
they are involved in a rebuild, etc. If you do the math,
you will see that there are some keys that have been charged
O (log N) on rebuilds, but because there are so many more
who have been charged less, it works out to O (1) amortized time. So, yes, a single insert can take
O (1) time, but, starting from an empty hash, N inserts
take O (N) time.
Rebuilding after a bunch of inserts is actually a wellknown
technique for datastructures. Often a datastructure is only
partially rebuild (giving the technique its name: "partially
rebuilding"). Rebuilding the entire datastructure is just
an extreme variant.
Abigail
 [reply] 

