Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Very clever!

If I am understanding this correctly, basically you are masking the 20 LSB so that every 32 bit integer with the same 12 MSB will get packed into the same hash value string, meaning that there could be up to 2^12 = 4096 entries per hash key.

However, since I am packing inodes and since inodes are more-or-less sequentially assigned, it would seem that until 2^20 inodes have been assigned, there is no packing. So, wouldn't it be better to mask the MSB, rather than the LSB since the LSB would be relatively randomly uniformly assigned in most disk usage cases. i.e., wouldn't it be better to do something like:
my $key = $i & 0xfffff000;
or even maybe:
my $key = ($i & 0xfffff000) >> 3;
Indeed, your masking may partially explain why for 10e6 indexes, it takes 260MB, while doubling to 20e6 only increases to 426MB.

Also, since there are at most 2^12 duplicates, couldn't you save 2 bytes by packing just the parts that are not masked by fffff, so that you could pack it in a 'v', rather than a 'V'. i.e,. in my masking scheme:
$lookup[ $key ] .= pack 'va16', $i & 0xfff, $md5;

I am no perl monk, so I may be missing something of course...

Finally, it might be interesting to play with masking different amount of bits to see the space-saving vs. lookup time tradeoffs for different degrees of sparseness.

In reply to Re^2: Store larg hashes more efficiently (10e6 md5s in 260MB at 4Ás per lookup) by puterboy
in thread Store large hashes more efficiently by puterboy

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [mz2255]: I'm trying to post a question to "seekers of perl wisdom" but always getting a permission denied when I want to submit. There are no links in my post...
    [ambrus]: again? someone complained about this just a few days ago (although eventually they could post)
    [choroba]: Can you post the question in mz2255's scratchpad?
    [ambrus]: Corion, if you're here, can you check the spam filter logs to see what's triggering this time?
    [ambrus]: Petroza had trouble posting yesterday, but has posted Issues Fetching URL with a variable token since.
    [mz2255]: Yes, just edited the scratchpad. Not sure if I'm doing something wrong, it's my first time.

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (9)
    As of 2017-10-19 15:23 GMT
    Find Nodes?
      Voting Booth?
      My fridge is mostly full of:

      Results (255 votes). Check out past polls.