Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^6: elsif chain vs. dispatch

by Marshall (Prior)
on Apr 27, 2009 at 21:02 UTC ( #760434=note: print w/ replies, xml ) Need Help??


in reply to Re^5: elsif chain vs. dispatch
in thread elsif chain vs. dispatch

If you let Perl grow the hash, this super degenerate case will be detected and Perl will add bits to the hash key. The num keys start at 8, then 16,32,64,etc. The 9th entry to same hash value with buckets =8 would re-gen the entire hash. Now, I suppose that some case can be generated where at each bit addition, the same thing not only occurs, but becomes harder for earlier versions of Perl to detect!

I think my general advice about checking these parms: #buckets used, #total buckets and #total entries is a good one when dealing with very large or performance sensitive hashes.


Comment on Re^6: elsif chain vs. dispatch
Re^7: elsif chain vs. dispatch
by ikegami (Pope) on Apr 27, 2009 at 21:16 UTC

    The 9th entry to same hash value with buckets =8 would re-gen the entire hash.

    That doesn't prevent the degenerate case since you could end up with 9 entries in the same bucket of a 16 bucket hash after the split.

Re^7: elsif chain vs. dispatch
by JavaFan (Canon) on Apr 27, 2009 at 22:40 UTC
    But that would mean having N/4 keys hashing to the same bucket isn't detected. Which means the worst case is still Θ(N). In fact, if there's an ε > 0 such that it requires more than εN keys to be hashed to a single bucket before Perl reorders the hash, the worst case look up is still Θ(N).
      Yes, if I understand your point correctly: There is no absolute guarantee that all keys won't hash to the same hash key until the keys are absolutely unique! Correct!

      However in a practical sense, I think that you are going to be hard pressed to come up with a realistic example for this user's input data.

      Of course there is a "trick" here. Even if the hash table has to compare say 16 things to get a result, it is still going to be very fast!

      This idea that say 256 things will hash into an identical hash table entry is unlikely. Now "very, very seldom" doesn't mean "never".

      But, as the hash grows the probability of this decreases exponentially.

        However in a practical sense, I think that you are going to be hard pressed to come up with a realistic example for this user's input data.

        Accidentally, sure. But a intentionally, you have a DOS attack. That's why the fix is called a security fix.

        Yes, if I understand your point correctly: There is no absolute guarantee that all keys won't hash to the same hash key until the keys are absolutely unique! Correct!
        You understood me utterly wrong. The claim was made Perl detects if too many keys hash to the same bucket, the hash is expanded in size and the keys reinserted, spreading over more buckets. I then pointed out that the description of how it's done still means that you can have enough keys map to the same bucket so your lookup isn't constant anymore.
        This idea that say 256 things will hash into an identical hash table entry is unlikely. Now "very, very seldom" doesn't mean "never".
        Yes, and? We were talking about a worst case scenario. And a worst case scenario could be anything that doesn't never happen.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://760434]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (17)
As of 2014-09-22 13:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (194 votes), past polls