Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Bidirectional lookup algorithm? (try perfect hashing)

by oiskuu (Hermit)
on Jan 13, 2015 at 02:21 UTC ( [id://1113021]=note: print w/replies, xml ) Need Help??


in reply to Re: Bidirectional lookup algorithm? (try perfect hashing)
in thread Bidirectional lookup algorithm? (Updated: further info.)

Ok. Here's a crude demo in case someone is interested in doing benchmarking comparisons or whatever. The program is pure C, tested on linux 64-bit. Give two arguments: Nkeys and Filename. File should contain sym-value pairs, both unique in their domains. (Space-separated, one pair per line).

Update. Some additional notes regarding the program.

Portability is easily improved upon:
  • Replace mmap/mremap with malloc/realloc. This is trivial.
  • The test for direct vs indirect entries (ss offset) might be written as !(s->o >> 56), avoiding issues with endianness.
There are at least two bugs to fix:
  • Verify sym to val loop has no test for NULL; this will fault on a missing key.
  • The fetch_by_key should not proceed with a memcmp in case keylen > 8, but the slot is direct (ie <= 8).
Many further optimizations to consider:
  • Try to allocate ss strings so that cache-line boundaries aren't crossed.
  • Terminate strings with a high-bit: more compact, plus faster than using memchr.
  • Specific inlined implementations for 8-byte memcmp, etc.
  • Inplace reorder looks up every key twice. Place an (un)processed flag to halve this.
  • Choose a faster hash function for string keys, FNV perhaps.
  • A primitive hash function for values. Just a mul by constant?
There's also a wishlist for the CMPH library, but enough for now.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1113021]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2025-07-08 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.