Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^5: Heap structure for lookup?

by hdb (Monsignor)
on May 27, 2015 at 10:44 UTC ( [id://1127977]=note: print w/replies, xml ) Need Help??

in reply to Re^4: Heap structure for lookup?
in thread Heap structure for lookup?

If you know the number of values upfront (or an upper barrier to the number) and if the values are roughly equally distributed across the full range (with some clustering of course), you could pre-allocate an array and then come up with a guess where a particular value belongs (the value divided by 2**64 or some algorithm based on the binary representation of that value). Then do some local sorting. If you made the array somewhat larger than needed you would have less clashes.

Replies are listed 'Best First'.
Re^6: Heap structure for lookup?
by BrowserUk (Patriarch) on May 27, 2015 at 11:42 UTC

    What I've come up with is a very simple & compact hash implementation:

    The insertion rate of 150e6 (into a fixed-size array of just over 200e6) is less that 1/4 microsecond per; or 37 second for them all; which is fine.

    If I push the number of inserts to 190e6, you can visibly see -- watching the trace of every 1e6th insertion -- the insertion rate slow down; but still the insertion rate is less than 1/2 a microsecond per.

    The lookups, currently done manually by feeding some of the values traced out back in to check they are found, and a few off-the-top-of-my-head other numbers to check they aren't, and the lookup time is around 20 microseconds per, including the perl->XS->perl transitions and conversions. Which is also very acceptable:

    C:\test>lookup64 -N=150e6 16536075677023473182 @ 171436696 ... 6917323807836132563 @ 155785214 4887574662696880337 @ 47194064 9190776599876911997 @ 197069741 5378457539553008893 @ 15322593 6479208648984222734 @ 29944101 7965142717433510515 @ 185179020 1980553766129900057 @ 1573360 4821546746934524968 @ 179443020 Inserting 150e6 U64s took 38.428204060 seconds 4821546746934524968 4821546746934524968 @ 179443020 in 0.000000000 s 1980553766129900057 1980553766129900057 @ 1573360 in 0.000020027 s 6479208648984222734 6479208648984222734 @ 29944101 in 0.000023127 s 4887574662696880337 4887574662696880337 @ 47194064 in 0.000015020 s 1 1 @ 0 in 0.000015974 s 5 5 @ 0 in 0.000000000 s 12345 12345 @ 0 in 0.000018120 s 1234567890987654321 1234567890987654321 @ 0 in 0.000018835 s 16536075677023473182 16536075677023473182 @ 171436696 in 0.000000000 s

    All together, I'm really surprised that such a simple hash implementation should work quite so well; even allowing for a 25% to 10% headroom in the array.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1127977]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-07-13 23:03 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.