Re^4: What is the most efficient way to see if a hash is empty?

The least bias would be

%a = 1..2e6; %b = 1..2e6;
cmpthese -1, { 
    A=>q[ scalar keys %a,     ++$i for 1 .. 10000 ], 
    B=>q[ scalar keys %b and  ++$i for 1 .. 10000 ],
    C=>q[ my $x = keys %a,    ++$i for 1 .. 10000 ],
    D=>q[ my $x = keys %b and ++$i for 1 .. 10000 ],
}
__END__
   Rate    D    C    A    B
D 280/s   --  -4% -31% -31%
C 290/s   4%   -- -28% -28%
A 404/s  44%  39%   --  -1%
B 406/s  45%  40%   1%   --
[download]

Testing with empty %b gives

%a = 1..2e6; %b = ();
cmpthese -1, {
    A=>q[ scalar keys %a, ++$i for 1 .. 10000 ],
    B=>q[ scalar keys %b, ++$i for 1 .. 10000 ],
    C=>q[ my $x = keys %a, ++$i for 1 .. 10000 ],
    D=>q[ my $x = keys %b, ++$i for 1 .. 10000 ],
}
__END__
   Rate    C    D    A    B
C 288/s   --  -2% -30% -31%
D 293/s   2%   -- -28% -29%
A 410/s  43%  40%   --  -1%
B 415/s  44%  41%   1%   --
[download]

from which I would conclude that there's a minimal difference between testing an empty and a full hash.

update: moritz code using these instead of sub {die}:

use strict;
use warnings;
use Benchmark qw(cmpthese);

for my $len (2..6) {
    our (%full, %empty);

    my $a = 'A';
    $full{$a++} = $a while length($a) < $len;

    print "\nLength $len with ", scalar(%full), "items\n";
    cmpthese -1, {
        full    => q[scalar keys %full,  ++$i for 1 .. 10000],
        empty   => q[scalar keys %empty, ++$i for 1 .. 10000],
    }
}
__END__

Length 2 with 22/32items
       Rate empty  full
empty 350/s    --  -15%
full  411/s   17%    --

Length 3 with 510/1024items
       Rate  full empty
full  406/s    --   -2%
empty 415/s    2%    --

Length 4 with 13961/32768items
       Rate  full empty
full  409/s    --   -1%
empty 414/s    1%    --

Length 5 with 310721/524288items
       Rate empty  full
empty 414/s    --    0%
full  414/s    0%    --

Length 6 with 8655839/16777216items
       Rate empty  full
empty 407/s    --   -4%
full  426/s    5%    --
[download]

But! If I introduce a third test condition which also tests for %full, I get:

...
    cmpthese -1, {
        full    => q[scalar keys %full,  ++$i for 1 .. 10000],
        empty   => q[scalar keys %empty, ++$i for 1 .. 10000],
        fake    => q[scalar keys %full,  ++$i for 1 .. 10000],
    }
}
__END__

Length 2 with 22/32items
       Rate  fake empty  full
fake  403/s    --   -2%   -3%
empty 411/s    2%    --   -2%
full  417/s    4%    2%    --

Length 3 with 510/1024items
       Rate  fake  full empty
fake  415/s    --   -1%   -1%
full  418/s    1%    --   -0%
empty 419/s    1%    0%    --

Length 4 with 13961/32768items
       Rate  full empty  fake
full  406/s    --   -2%   -2%
empty 414/s    2%    --    0%
fake  414/s    2%    0%    --

Length 5 with 310721/524288items
       Rate  fake  full empty
fake  406/s    --   -0%   -1%
full  407/s    0%    --   -1%
empty 411/s    1%    1%    --

Length 6 with 8655839/16777216items
       Rate  fake  full empty
fake  400/s    --   -3%   -3%
full  411/s    3%    --   -0%
empty 411/s    3%    0%    --
[download]

using a sub instead of a quoted string

...
    cmpthese -1, {
        full    => sub{scalar keys %full,  ++$i },
        empty   => sub{scalar keys %empty, ++$i },
        fake    => sub{scalar keys %full,  ++$i },
    }
}
__END__

Length 2 with 22/32items
           Rate empty  full  fake
empty 4955017/s    --   -4%   -5%
full  5167154/s    4%    --   -1%
fake  5215901/s    5%    1%    --

Length 3 with 510/1024items
           Rate  full empty  fake
full  5038923/s    --   -4%   -5%
empty 5265576/s    4%    --   -1%
fake  5293291/s    5%    1%    --

Length 4 with 13961/32768items
           Rate empty  full  fake
empty 4633858/s    --   -1%   -5%
full  4677165/s    1%    --   -4%
fake  4858803/s    5%    4%    --

Length 5 with 310721/524288items
           Rate  full  fake empty
full  4812084/s    --   -5%   -7%
fake  5041231/s    5%    --   -3%
empty 5193418/s    8%    3%    --

Length 6 with 8655839/16777216items
           Rate  fake  full empty
fake  4689731/s    --   -4%  -16%
full  4906438/s    5%    --  -12%
empty 5604762/s   20%   14%    --
[download]

Hmm... let's use moritz's code with strings instead of hashes:

for my $len (2..6) {
    our ($full, $empty);

    my $a = 'A';
    $full .= $a while length($full) < $len;

    print "\nLength $len with ", scalar($full), " items\n";
    cmpthese -1, {
        full  => sub { die unless $full },
        empty => sub { die if $empty    },
        fake  => sub { die unless $full },
    }
}
__END__
Length 2 with AA items
            Rate empty  full  fake
empty 29399327/s    --   -5%  -35%
full  31036076/s    6%    --  -31%
fake  44964665/s   53%   45%    --
...
[download]

It looks like the iteration/sub call overhead is shadowing the scalar %hash lookup, and moritz has been testing something else...

I'm re-reading No More Meaningless Benchmarks! ;-)

Comment on Re^4: What is the most efficient way to see if a hash is empty? Select or Download Code

Replies are listed 'Best First'.

Re^5: What is the most efficient way to see if a hash is empty?
by BrowserUk (Patriarch) on Apr 28, 2009 at 19:59 UTC

The trouble is, the comma operator does not prevent the "Useless use of ... in a void context:

C:\test>perl -wle"substr( $i, 0, 1 ),  ++$i for 1 .. 100"
Useless use of substr in void context at -e line 1.
Use of uninitialized value $i in substr at -e line 1.
[download]

And I'm never sure whether that means it will always be optimised away as with:

C:\test>perl -MO=Deparse -wle"$x = 1,2,3"
Useless use of a constant in void context at -e line 1.
Useless use of a constant in void context at -e line 1.
Name "main::x" used only once: possible typo at -e line 1.
BEGIN { $^W = 1; }
BEGIN { $/ = "\n"; $\ = "\n"; }
$x = 1, '???', '???';
-e syntax OK
[download]

Or not? And with Benchmark not enabling warnings when it compiles stringified test code, it is very easy to find yourself benchmarking empty statements and drawing conclusions on that basis.

And as you've demonstrated with your version of moritz benchmark, when attempting to benchmark micro-optimisations like the OPs question, the overhead of calling two levels of sub (benchmark wraps the user supplied code in it's own wrapper internally), can have a significant effect upon the perceived results.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]
[select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks