Re^2: Array vs. Hash for sparsely integer-indexed databy puterboy (Scribe)
|on Jan 29, 2013 at 03:55 UTC||Need Help??|
First of all thanks to *everyone* for all the helpful advice.
Just some clarification. I am caching hard disk inodes since I need to look up hard links (it's to copy a highly hard-linked but structured tree of folders which is too slow to do using rsync and 'cp -a' since they (obviously) don't know the special structure).
So the key is an integer less than max inodes on the filesystem. The value is a 32-digit hex number expressed as an ascii string
Sometimes the list can be pretty dense if for example the tree being copied occupies the majority of the filesystem and if most of the inodes are used (or at least if they are allocated relatively contiguous); other times it could be o(10%) or less.
Each inode hash entry may be looked up anywhere from once to 10s or even thousands of times (presumably no more than maxhardlinks).
After doing some profiling, it seems that the speed of lookup is less important than memory usage since the rate limiting step is fstat'ing files which is significantly slower.
So, it seems that hashes are best since except in the densest of cases, they will require less storage.
Again, thanks for all the helpful advice. I learned a lot!