I thought of the overhead of building the list, but the difference was more than I would think that that takes into account. But I did not measure it. However I just used Devel::Size
and it reported 32083260 as the size of the hash and 44083260 as the total_size. (Regardless of how I built that hash.) My guess is that the rest is extra buffering on the part of various memory requests. Part of which is going to be the OS buffering Perl's mallocs, which is likely to be highly OS dependent, and possibly order of request dependent as well.
On the implementation, note that the performance can be improved both by moving from tied data structures to OO access and again (probably more) by using documented tuning parameters. Additionally with a dataset this small, it may be acceptable to switch back to a hash stored in memory (since BerkeleyDB's hashing takes less memory than Perl's). Furthermore the fact that it is general-purpose makes me more willing to take that approach rather than constructing a special-purpose solution that is possibly fragile.
And on caching. I agree heartily that optimizing for perfection is only going to be a niche exercise. I disagree that optimizing for better use of multi-level memory is unimportant though. I predict that you will see programmers over time being pushed towards data structures that are readily tuned to local memory configurations, dynamically if possible. You could view the popularity of databases as a step in this direction. A far bigger step is Judy arrays. Sure, it is very complicated to implement, but you only need to implement once. (Well, Perl would need to reimplement for licensing reasons...) And from the end programmer perspective, it is just a question of replacing the black box of a hash by the black box of another associative array implementation and voila! Things run faster!
A few more years of increasing discrepancies between access times, and the difference will become bigger still. Eventually it will be big enough that common scripting languges will make the switch. Most programmers will never need to understand what actually happened, any more than we really understand most of the ongoing changes that make Moore's law tick in the first place.