Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: mathematical proof

by JavaFan (Canon)
on Feb 03, 2009 at 08:00 UTC ( #740911=note: print w/ replies, xml ) Need Help??


in reply to Re^2: mathematical proof
in thread mathematical proof

For example consider a hash in the worst case scenario, where you have just done a hash split. Then every element has been moved once. Half of them were around for the previous hash split, so have been around twice.
That assumes a hash split only after Ω(n) new inserts. But then in the worst case scenario where all elements hash to the same bucket, searching for a key has to walk a list, making the search linear instead of O(1).

I know hashes have changed in the past so that the linked list hanging of each bucket cannot grow unbounded. But that means that worst case, you either have hash splits in o(n) (lowercase o), or search times that aren't O(1).

Any "mathematical proof" will have to look at the C code and consider how often hash splits occur worst case, and how long the linked lists can be.


Comment on Re^3: mathematical proof
Re^4: mathematical proof
by tilly (Archbishop) on Feb 03, 2009 at 14:39 UTC
    As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working. I'm sorry I didn't make that explicit. If you wish to qualify everything I said about hashes with "in the average case", do so because that is correct. But if you wish to mix an analysis of the average case with comments about a worst case type of scenario, that's wrong.

    Incidentally if you look at the worst case performance of a hash that uses linked lists for the buckets, then the hash access is O(n) which means that building a hash with n buckets cannot be better than O(n*n). Which clearly loses to the array solution. Conditional logic to try and reduce the likelyhood of worst case performance cannot improve this fact unless you replace the linked list in the bucket with something else (like a btree).

    Changing the structure of the buckets will make the average case hash access O(1) and worst case O(log(n)) but complicates the code and makes the constant for a hash access worse. People tend to notice the average case performance, and so do not make that change. Real example: when abuse of the hashing function showed up as a security threat in Perl, multiple options were considered. In the end rather than making the worst case performance better, it was decided to randomize the hashing algorithm so that attackers cannot predict what data sets will cause performance problems.

      As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working.
      So do I (well, I try to avoid the term 'average' - in this case, I'd use 'expected'. 'amortized' is another term a layman may call 'average'). But I stop assuming that as soon as 'worst case' is mentioned. Or 'mathematical proof'.
        Why would you stop making that assumption when "mathematical proof" is mentioned? We can mathematically prove what happens in the average case as easily as we can analyze the worst case, and frequently do so. Furthermore, as I already demonstrated, programmers usually care more about the average case than the worst case.

        And yes, average can mean several different things. I was slightly sloppy about that, but not so sloppy that I think it would cause any real confusion. However I stay away from "expected" with a lay audience because I worry that laypeople are likely to misunderstand "expected" as "median". Instead I'd lean towards "amortized".

        Sorry, but "average case" is well recognized and accepted terminology.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://740911]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (12)
As of 2014-12-27 19:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls