Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The aspect of authoritative that I was talking about was your giving enough technical information which is hard for others to judge that it looked like you must know a lot about what you are talking about. It is independent of age. Of course I am doing the same - throwing out lots of technical information - but hopefully I actually understand it and will be able to answer any follow-up questions until you are comfortable that you understand it as well.

As for what is better than hashing, define better. For key/value pairs with a flat memory model, hashing gives the best average performance of any simple data structure that I know of. That is why people use them. If an obviously superior alternative existed, then everyone would use that and nobody would have heard of hashing. But hashing has bad worst-case performance, doesn't keep things in order (often a requirement), and blows your cache badly. If you care about worst-case performance, having things come out in order, or if memory cache effects matter to you, then alternatives can beat hashing. (My understanding from their docs is that Judy arrays can beat hashes because of doing a lot of work to pay attention to memory cache effects. Similarly for large data structures backed on disk a BTREE beats hashing because you hit disk far less.)

And finally I assure you that I am not wrong about point 3. A hashing algorithm that is working well appears to throw keys into buckets randomly. The resulting distribution of numbers of keys per bucket is a Poisson distribution, with lambda depending on the ratio of numbers of keys to buckets. If you add too many keys, Perl increases the number of buckets so that lambda stays in a fixed range.

The fly in the ointment is the phrase, appears to throw keys into buckets randomly. As we all know, computers don't do "random" very well, and besides which, when we see the same key later we had better go into the right bucket. There is therefore no way to guarantee that you won't have lots of keys in one bucket, and the rest of the buckets empty. The odds against this happening by accident are astronomical, but it is not impossible.

And that is the worst case scenario - that the hashing algorithm throws all keys into one bucket. Which happens (almost) never by accident, but which a malicious attacker can try to make happen intentionally. And if someone does that, then it does Perl absolutely no good to provide the hash with a million buckets to use - everything is going into just one and the well-designed hash implementation degrades into scanning a huge linked list.

Hence the defence of randomizing the hashing algorithm itself so that nobody can predict what hashing algorithm they are dealing with, and therefore what set of keys will give it problems...


In reply to Re: Re: Re: Re: Re: Slowness when inserting into pre-extended array by tilly
in thread Slowness when inserting into pre-extended array by ryangabbard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this?Last hourOther CB clients
    Other Users?
    Others exploiting the Monastery: (2)
    As of 2024-09-07 15:44 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?
      erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.