), I see. I read at Sleepycat that B-tree wins over hash, and then (if/when prescaled B-tree size exceeded) the hash wins again, but that sounds like just doing it wrong. I bow to your experience.
My gut instinct was to leave all seeking and logic to a trigger in the DB, but say we just use Perl - how to get more detailed profiling info to find the culprit? We don't have info about these objects but how about the general case..
I thought about the cache digest bitmap in Sourceforge squid docs, which uses 1 bit is used per object. This led me to imagine using a sequential memory map for 2 million objects, each of which contains address of parent and some other data. But realized locality in B-tree must beat this..
Any ideas as to performance if you
1) maintained memory map with Inline::C function which includes address of parent, an access counter, and a couple extra bytes. say 16MB for map plus an 8MB pool of free addresses. (or is this moot due to locality win in B-tree)
2) skip the memory map, but also
a) maintain a medium size B-tree of popular object MD5s, which are found because a counter in each object is incremented on seek. Added to populars tree when exceeding a pretuned threshold., hopefully this is searched very frequently instead of the big tree.
b) maintain a small hash to cache last X number of objects and cut churn.