Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Re: A short meditation about hash search performance

by demerphq (Chancellor)
on Nov 17, 2003 at 08:58 UTC ( #307608=note: print w/replies, xml ) Need Help??


in reply to Re: A short meditation about hash search performance
in thread A short meditation about hash search performance

Abigail, I have two minor questions for you. First off you speak of finding the correct bucket as occurring in constant time. Given that the time to calculate the bucket value is dependent on the length of the key I dont quite see how this is correct. Or does this factor disappear because it averages to a constant time in normal use? I have a similar concern about the doubling of the buckets during insertion. My by now hazy recollection of big O() says that this behaviour is signifigant and should be included in the O() of hash insertion. Is this wrong? If its not wrong how would it be calculated? I havent the foggiest how you would calculate the effect of a factor that comes into play so rarely. Or is it again that it averages to 0 and so can be left out of the equation?


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Replies are listed 'Best First'.
Re: A short meditation about hash search performance
by Abigail-II (Bishop) on Nov 17, 2003 at 09:34 UTC
    Yes, you are right that length of a key plays a role in calculating the hash value. And it plays a role in comparing two keys as well. You can take this time into account, and say insertion/searching takes O (k), where k is the length of input. There's no relationship between k and n though, and usually we aren't interested in this factor. We just define that calculating a hash value can be done in constant time, and so can comparing two keys.

    As for the doubling, this factors out (assuming it isn't possible to construct a set of keys such that even after repeated doubling, they keep hashing to the same values). The sketch of the proof is as follows: suppose we rebuild the hash after N inserts; that is, the hash was rebuild when it contains N keys. Building a hash out of N keys will take O (N) time - this is O (1) per key. Now you also have to show that the next rebuild isn't taking place after inserting another c * N keys, for some constant c > 1. This means that if you rebuild a hash with N keys, for at least N / (1 + c) keys, this is the first time they are involved, for at least N / (1 + c)^2 keys, this is the second time they are involved in a rebuild, etc. If you do the math, you will see that there are some keys that have been charged O (log N) on rebuilds, but because there are so many more who have been charged less, it works out to O (1) amortized time. So, yes, a single insert can take O (1) time, but, starting from an empty hash, N inserts take O (N) time.

    Rebuilding after a bunch of inserts is actually a well-known technique for datastructures. Often a datastructure is only partially rebuild (giving the technique its name: "partially rebuilding"). Rebuilding the entire datastructure is just an extreme variant.

    Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://307608]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2019-04-23 12:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I am most likely to install a new module from CPAN if:
















    Results (118 votes). Check out past polls.

    Notices?