Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I probably need to use 'Q' rather than 'V', in case the unsigned integer index exceeds 4 billion (or so) -- remember the entries are sparse.

Now for 1 million entries, I still get ~134 bytes per entry using 'V' and ~138 bytes/entry using Q (which makes sense since I am adding 4 bytes/entry going from 'V' to 'Q').Without any packing of the hash value, I get 186 bytes/entry which adds 48 bytes/entry which again makes sense since in packing a 32-char hex string we go from 64 bytes of storage to 16.

So, I guess that is as good as I can do with packing. What surprises me I guess is the overhead of the hash itself. i.e., it takes 134 bytes/entry to store 4 bytes of index and 16 bytes of value, which is an overhead of 114 bytes/entry.

However, looking at how perl seems to store just a plain scalar variable, it seems like it takes 48 bytes of overhead plus the size of the data element. Now looking at 'size' vs. 'total_size' for the hash, I get that 'size' alone (which doesn't include the space to store the hash values) is 74 bytes which after subtracting 4 bytes for storing the packed 4 byte index and 48 bytes for storing a scalar, implies that Perl is using an additional 22 bytes to perform the magic of hash storage and to point to the value stored by the hash index. The 'total_size' then adds an additional 64 bytes which is explained by 48 bytes of scalar overhead plus the 16 bytes required to pack a 32-hex string.

So, I guess the surprising thing to me is the twice overhead of 48 bytes to store the scalar index and and another 47 to store the scalar value. This 96 bytes (plus the 22 bytes for the hash magic & pointer) really dwarfs the storage of the packed index + value itself (which is 4+16 = 20 bytes). But if my understanding of the 48 byte overhead per scalar is correct, then presumably there is no way to store a large hash more efficiently (and since the indices are very sparse, I can't use an array or other contiguous storage). Am I correct?

In reply to Re^4: Store large hashes more efficiently by puterboy
in thread Store large hashes more efficiently by puterboy

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (5)
    As of 2018-03-21 10:14 GMT
    Find Nodes?
      Voting Booth?
      When I think of a mole I think of:

      Results (265 votes). Check out past polls.