Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Very clever!

If I am understanding this correctly, basically you are masking the 20 LSB so that every 32 bit integer with the same 12 MSB will get packed into the same hash value string, meaning that there could be up to 2^12 = 4096 entries per hash key.

However, since I am packing inodes and since inodes are more-or-less sequentially assigned, it would seem that until 2^20 inodes have been assigned, there is no packing. So, wouldn't it be better to mask the MSB, rather than the LSB since the LSB would be relatively randomly uniformly assigned in most disk usage cases. i.e., wouldn't it be better to do something like:
my $key = $i & 0xfffff000;
or even maybe:
my $key = ($i & 0xfffff000) >> 3;
Indeed, your masking may partially explain why for 10e6 indexes, it takes 260MB, while doubling to 20e6 only increases to 426MB.

Also, since there are at most 2^12 duplicates, couldn't you save 2 bytes by packing just the parts that are not masked by fffff, so that you could pack it in a 'v', rather than a 'V'. i.e,. in my masking scheme:
$lookup[ $key ] .= pack 'va16', $i & 0xfff, $md5;

I am no perl monk, so I may be missing something of course...

Finally, it might be interesting to play with masking different amount of bits to see the space-saving vs. lookup time tradeoffs for different degrees of sparseness.

In reply to Re^2: Store larg hashes more efficiently (10e6 md5s in 260MB at 4Ás per lookup) by puterboy
in thread Store large hashes more efficiently by puterboy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (6)
    As of 2014-07-23 01:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (131 votes), past polls