Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
I probably need to use 'Q' rather than 'V', in case the unsigned integer index exceeds 4 billion (or so) -- remember the entries are sparse.

Now for 1 million entries, I still get ~134 bytes per entry using 'V' and ~138 bytes/entry using Q (which makes sense since I am adding 4 bytes/entry going from 'V' to 'Q').Without any packing of the hash value, I get 186 bytes/entry which adds 48 bytes/entry which again makes sense since in packing a 32-char hex string we go from 64 bytes of storage to 16.

So, I guess that is as good as I can do with packing. What surprises me I guess is the overhead of the hash itself. i.e., it takes 134 bytes/entry to store 4 bytes of index and 16 bytes of value, which is an overhead of 114 bytes/entry.

However, looking at how perl seems to store just a plain scalar variable, it seems like it takes 48 bytes of overhead plus the size of the data element. Now looking at 'size' vs. 'total_size' for the hash, I get that 'size' alone (which doesn't include the space to store the hash values) is 74 bytes which after subtracting 4 bytes for storing the packed 4 byte index and 48 bytes for storing a scalar, implies that Perl is using an additional 22 bytes to perform the magic of hash storage and to point to the value stored by the hash index. The 'total_size' then adds an additional 64 bytes which is explained by 48 bytes of scalar overhead plus the 16 bytes required to pack a 32-hex string.

So, I guess the surprising thing to me is the twice overhead of 48 bytes to store the scalar index and and another 47 to store the scalar value. This 96 bytes (plus the 22 bytes for the hash magic & pointer) really dwarfs the storage of the packed index + value itself (which is 4+16 = 20 bytes). But if my understanding of the 48 byte overhead per scalar is correct, then presumably there is no way to store a large hash more efficiently (and since the indices are very sparse, I can't use an array or other contiguous storage). Am I correct?

In reply to Re^4: Store large hashes more efficiently by puterboy
in thread Store large hashes more efficiently by puterboy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (7)
    As of 2014-09-21 08:22 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (167 votes), past polls