Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


For many applications, that's a false economy. If you're going to reuse the hash container, then it seems to me to be cheaper to clear the hash (%h = ()) instead of destroying the hash container and then recreating it (undef %hash):

$ perl Checking container differences between delete and clear: Hash size (initial): 0/8 Hash size (after fill): 62/64 Hash size (after clear): 0/64 Hash size (after delete): 0/8 Rate delete reuse overwrite delete 38049/s -- -3% -39% reuse 39358/s 3% -- -37% overwrite 62375/s 64% 58% --

Overwriting the hash is obviously the fastest, as you needn't clear or destroy the container. Of course for many applications you'd have the added headache of ensuring that old data and current data don't mix.

Clearing the hash container allows you to reuse the hash without mixing old and current data, but might appear to be slower than simply deleting the hash container.

Deleting the hash container might appear to be faster until you also account for the time it takes to recreate the hash container when you use it. It might matter more than it appears, though: I'd expect that clearing the hash may leave the container at the same size, so re-using the hash in the case there are many keys may be significantly faster than clearing/recreating it because perl could avoid the multiple container resize operations as it adds the keys. I was going to test that, but for some reason, I don't see how to make perl give the "used/total" buckets value for a hash any longer. (Funny, when I was a perl novice, I was getting that frequently, but now that I want it, I can't seem to make it happen. I guess I'll have to hit the documentation and see if I can suss it out. If so, I'll try to remember to update this node.)

Deleting the hash container might appear to be faster until you also account for the time it takes to recreate the hash container when you use it. It matters a little more than it appears, though: clearing the hash leaves the container the same size, so re-using the hash is slightly faster than clearing/recreating it because perl can avoid many of the container resize operations as it adds the keys. On the positive side, though, clearing the container may allow your application to reclaim some memory in the event that some datasets may have significantly more keys than are ordinarily needed. (Although I expect that would be as insignificant as the savings from the resizes just mentioned.)

At least that's how I see it... I'm providing the benchmark so you can point out what I may be missing...

$ cat use strict; use warnings; use Benchmark ':all'; use Hash::Util 'bucket_stats'; my @some_keys = ('A' .. 'Z', 'a' .. 'z', '0' .. '9'); print "Checking container differences between delete and clear:\n"; my %gh; print "Hash size (initial): ", old_hash_stats(\%gh), "\n"; fill_hash(\%gh); print "Hash size (after fill): ", old_hash_stats(\%gh), "\n"; %gh=(); print "Hash size (after clear): ", old_hash_stats(\%gh), "\n"; undef %gh; print "Hash size (after delete): ", old_hash_stats(\%gh), "\n"; print "\n"; sub old_hash_stats { my $hr = shift; my @hash_stats = bucket_stats($hr); return "$hash_stats[0]/$hash_stats[1]"; } sub fill_hash { my $hr = shift; @{$hr}{@some_keys} = (0) x @some_keys; } cmpthese(500000, { 'overwrite' => sub { my %h; fill_hash(\%h); fill_hash(\%h); }, 'delete' => sub { my %h; fill_hash(\%h); undef %h; fill_hash(\%h); }, 'reuse' => sub { my %h; fill_hash(\%h); %h = (); fill_hash(\%h); }, });

Update: It seems that the old behavior of scalar(%h) changed in version 5.25.3 from displaying "buckets used/bucket count" to simply "buckets used". (A poor idea, in my opinion.) Anyway, with the Hash::Util function bucket_stats we can still get the information. I've edited the text and benchmark accordingly, and rearranged things a little for readability.

Update 2: Added the bit about destroying the container allows you to reclaim memory as a possible benefit.


When your only tool is a hammer, all problems look like your thumb.

In reply to Re: '%hash = ()' is slower than 'undef %hash' by roboticus
in thread '%hash = ()' is slower than 'undef %hash' by rsFalse

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (4)
    As of 2019-10-17 03:23 GMT
    Find Nodes?
      Voting Booth?