I thought of the overhead of building the list, but the difference was more than I would think that that takes into account. But I did not measure it. However I just used Devel::Size
and it reported 32083260 as the size of the hash and 44083260 as the total_size. (Regardless of how I built that hash.) My guess is that the rest is extra buffering on the part of various memory requests. Part of which is going to be the OS buffering Perl's mallocs, which is likely to be highly OS dependent, and possibly order of request dependent as well.
On the implementation, note that the performance can be improved both by moving from tied data structures to OO access and again (probably more) by using documented tuning parameters. Additionally with a dataset this small, it may be acceptable to switch back to a hash stored in memory (since BerkeleyDB's hashing takes less memory than Perl's). Furthermore the fact that it is general-purpose makes me more willing to take that approach rather than constructing a special-purpose solution that is possibly fragile.
And on caching. I agree heartily that optimizing for perfection is only going to be a niche exercise. I disagree that optimizing for better use of multi-level memory is unimportant though. I predict that you will see programmers over time being pushed towards data structures that are readily tuned to local memory configurations, dynamically if possible. You could view the popularity of databases as a step in this direction. A far bigger step is Judy arrays. Sure, it is very complicated to implement, but you only need to implement once. (Well, Perl would need to reimplement for licensing reasons...) And from the end programmer perspective, it is just a question of replacing the black box of a hash by the black box of another associative array implementation and voila! Things run faster!
A few more years of increasing discrepancies between access times, and the difference will become bigger still. Eventually it will be big enough that common scripting languges will make the switch. Most programmers will never need to understand what actually happened, any more than we really understand most of the ongoing changes that make Moore's law tick in the first place.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||