in reply to Re^2: mathematical proof in thread mathematical proof
For example consider a hash in the worst case scenario, where you have just done a hash split. Then every element has been moved once. Half of them were around for the previous hash split, so have been around twice.
That assumes a hash split only after Ω(n) new inserts. But then in the worst case scenario where all elements hash to the same bucket, searching for a key has to walk a list, making the search linear instead of O(1).
I know hashes have changed in the past so that the linked list hanging of each bucket cannot grow unbounded. But that means that worst case, you either have hash splits in o(n) (lowercase o), or search times that aren't O(1).
Any "mathematical proof" will have to look at the C code and consider how often hash splits occur worst case, and how long the linked lists can be.
Re^4: mathematical proof by tilly (Archbishop) on Feb 03, 2009 at 14:39 UTC 
As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working. I'm sorry I didn't make that explicit. If you wish to qualify everything I said about hashes with "in the average case", do so because that is correct. But if you wish to mix an analysis of the average case with comments about a worst case type of scenario, that's wrong.
Incidentally if you look at the worst case performance of a hash that uses linked lists for the buckets, then the hash access is O(n) which means that building a hash with n buckets cannot be better than O(n*n). Which clearly loses to the array solution. Conditional logic to try and reduce the likelyhood of worst case performance cannot improve this fact unless you replace the linked list in the bucket with something else (like a btree).
Changing the structure of the buckets will make the average case hash access O(1) and worst case O(log(n)) but complicates the code and makes the constant for a hash access worse. People tend to notice the average case performance, and so do not make that change. Real example: when abuse of the hashing function showed up as a security threat in Perl, multiple options were considered. In the end rather than making the worst case performance better, it was decided to randomize the hashing algorithm so that attackers cannot predict what data sets will cause performance problems.  [reply] 

As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working.
So do I (well, I try to avoid the term 'average'  in this case, I'd use 'expected'. 'amortized' is another term a layman may call 'average'). But I stop assuming that as soon as 'worst case' is mentioned. Or 'mathematical proof'.
 [reply] 

Why would you stop making that assumption when "mathematical proof" is mentioned? We can mathematically prove what happens in the average case as easily as we can analyze the worst case, and frequently do so. Furthermore, as I already demonstrated, programmers usually care more about the average case than the worst case.
And yes, average can mean several different things. I was slightly sloppy about that, but not so sloppy that I think it would cause any real confusion. However I stay away from "expected" with a lay audience because I worry that laypeople are likely to misunderstand "expected" as "median". Instead I'd lean towards "amortized".
 [reply] 

 [reply] 
