As hash is so heavily used in Perl, it is worth to explain why hash search is O(n).
One runs into the worst case of a hash search when all the elements calculate to the same hash key. In this case, a hash becomes no more than a one-dimentional chain of elements.
Even worse, the element one trying to look up happens to be the last element in the sequence. Now the searching engine has to go throguh the entire chain, all n elements, to find what one wants.
(In rest of the post, I will just use small o instead of big O, as now I am more focused on complexity, doesn't matter whether it is for worst case, best case or whatever. Big O is just a notion saying that it is for the worst case.)
In an average case, if n is the number of elements we have in a hash, and k is the number of all possible hash key values. Ideally all elements would spread nearly even among possible hash keys, so the chain length under each hash key is n/k. Averagely you need to go though half of the chain to get what you want, thus you need to go thru n/2k elements.
So averagely a hash search would close to o(n/2k) (be very careful, n is a variable when k is a constant, this is serious), which ~ o(n).
How come the average case is the same as the worst case, NO they are not the same, but they are ~.
Some time it is seriously dangerous to casually simplify things that is seriously complex.
o(n) is not o(n/2k), but o(n) ~ o(n/2k) (again, n is variable, and k is constant, this is very serious), the easiest way to explain ~, yet does not lose seriousness too much is that: the speed o(n) and o(n/2k) approach infinite is the same.
Although o(n) ~ o(n/10^100), what takes o(n/10^100) algorithm 1 sec, might take a o(n) alogorithm 10^100 sec to finish. They are far from the same.
Re: An informal introduction to O(N) notation
Replies are listed 'Best First'.
Most often k is not a constant, but a function of n. If for example k would be ~ c * n then the average length of the lists of items in each bucket would be ~ n / (c * n) which is ~ 1 / c. Which is a constant.
Therefore the key lookup of the usual implementations of hashes has average complexity of o(1).
Of course this implementation forces you to recreate the hash when the number of keys grows over some limit which can be time consuming.
P.S.: I just looked at the number of buckets in the Perl hashes (I iteratively added elements, watched the number of buckets and printed the number of elements and buckets after every rehashing). This is the results:
Thank you. This explains everything I was attempting to, but does it lucidly. Moreover, it goes in to a lot of things I didn't know about.
Only one thing I have to add: You can find the number of items in a hash with scalar keys %hash, and the number of used/total buckets with ($used_buckets, $total_buckets) = split '/', scalar %hash. With those numbers in hand, you should be able to get more exact timings, or demonstrate to yourself that what pg says is true.
Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).