Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^4: mathematical proof

by tilly (Archbishop)
on Feb 03, 2009 at 14:39 UTC ( [id://741001]=note: print w/replies, xml ) Need Help??

in reply to Re^3: mathematical proof
in thread mathematical proof

As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working. I'm sorry I didn't make that explicit. If you wish to qualify everything I said about hashes with "in the average case", do so because that is correct. But if you wish to mix an analysis of the average case with comments about a worst case type of scenario, that's wrong.

Incidentally if you look at the worst case performance of a hash that uses linked lists for the buckets, then the hash access is O(n) which means that building a hash with n buckets cannot be better than O(n*n). Which clearly loses to the array solution. Conditional logic to try and reduce the likelyhood of worst case performance cannot improve this fact unless you replace the linked list in the bucket with something else (like a btree).

Changing the structure of the buckets will make the average case hash access O(1) and worst case O(log(n)) but complicates the code and makes the constant for a hash access worse. People tend to notice the average case performance, and so do not make that change. Real example: when abuse of the hashing function showed up as a security threat in Perl, multiple options were considered. In the end rather than making the worst case performance better, it was decided to randomize the hashing algorithm so that attackers cannot predict what data sets will cause performance problems.

Replies are listed 'Best First'.
Re^5: mathematical proof
by JavaFan (Canon) on Feb 03, 2009 at 15:01 UTC
    As soon as someone talks about hash inserts being O(1) I assume that they are talking about the average case performance when your hash algorithm is working.
    So do I (well, I try to avoid the term 'average' - in this case, I'd use 'expected'. 'amortized' is another term a layman may call 'average'). But I stop assuming that as soon as 'worst case' is mentioned. Or 'mathematical proof'.
      Why would you stop making that assumption when "mathematical proof" is mentioned? We can mathematically prove what happens in the average case as easily as we can analyze the worst case, and frequently do so. Furthermore, as I already demonstrated, programmers usually care more about the average case than the worst case.

      And yes, average can mean several different things. I was slightly sloppy about that, but not so sloppy that I think it would cause any real confusion. However I stay away from "expected" with a lay audience because I worry that laypeople are likely to misunderstand "expected" as "median". Instead I'd lean towards "amortized".

      Sorry, but "average case" is well recognized and accepted terminology.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://741001]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-06-25 10:49 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.