in reply to
Re^3: Store large hashes more efficiently
in thread Store large hashes more efficiently
I probably need to use 'Q' rather than 'V', in case the unsigned integer index exceeds 4 billion (or so) -- remember the entries are sparse.
Now for 1 million entries, I still get ~134 bytes per entry using 'V' and ~138 bytes/entry using Q (which makes sense since I am adding 4 bytes/entry going from 'V' to 'Q').Without any packing of the hash value, I get 186 bytes/entry which adds 48 bytes/entry which again makes sense since in packing a 32-char hex string we go from 64 bytes of storage to 16.
So, I guess that is as good as I can do with packing. What surprises me I guess is the overhead of the hash itself.
i.e., it takes 134 bytes/entry to store 4 bytes of index and 16 bytes of value, which is an overhead of 114 bytes/entry.
However, looking at how perl seems to store just a plain scalar variable, it seems like it takes 48 bytes of overhead plus the size of the data element. Now looking at 'size' vs. 'total_size' for the hash, I get that 'size' alone (which doesn't include the space to store the hash values) is 74 bytes which after subtracting 4 bytes for storing the packed 4 byte index and 48 bytes for storing a scalar, implies that Perl is using an additional 22 bytes to perform the magic of hash storage and to point to the value stored by the hash index. The 'total_size' then adds an additional 64 bytes which is explained by 48 bytes of scalar overhead plus the 16 bytes required to pack a 32-hex string.
So, I guess the surprising thing to me is the twice overhead of 48 bytes to store the scalar index and and another 47 to store the scalar value. This 96 bytes (plus the 22 bytes for the hash magic & pointer) really dwarfs the storage of the packed index + value itself (which is 4+16 = 20 bytes). But if my understanding of the 48 byte overhead per scalar is correct, then presumably there is no way to store a large hash more efficiently (and since the indices are very sparse, I can't use an array or other contiguous storage). Am I correct?