ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:
Is there a more efficient and self-documenting way to see if a hash is empty than scalar(@{[%h]}) (or scalar(keys %h))?
Neither of these expressions seem to be particularly clear ways of saying "is this empty?". The first especially requires a fairly good understanding of hashes and references to even guess what it is doing. I can't seem to find any function or sub with a clearer-than-day name like is_hash_empty(%h). Am I missing something?
Additionally, both incantations seem like overkill. Checking for emptiness only requires a simple one-step operation to see if at least one key exists. On the other hand [%h] and keys %h copy elements to a new array and are O(N) rather than O(1)... unless, of course, Perl is doing some clever optimizations when it sees scalar(...).
Is it? Does Perl know how to optimize expressions like if (scalar(keys %h)) or scalar(@{[%h]}) && ... so that only one element is tested (or only a stored element count is extracted)? And if so, how sensitive is that optimization to variations in syntax, e.g. are things like scalar(foo()) optimized? And if so, how does "Perl" know it needs to optimize?
And if not, what alternative do you recommend? Using a single call to each(...) does not appear to be an option. Calls to each do not reset, so any subsequent sub that tried to walk through hash elements using each would start on the second key rather than the first. Probably not a good thing.
In the CB, some wise monk (my apologies - I can't remember who) - suggested that scalar(%h) might do it - it returns 0 when the array is empty. Is this always true? Is this an O(1) check for emptiness?
For very small hashes having an efficient way to test for emptiness is probably not an issue. However copying an entire hash could end up being very expensive if the hash is either empty or *very* large. And even with smaller hashes repeated O(N) tests add up quickly potentially turning an O(N) problem into O(N^2). So, if there is an O(1) way to test for emptiness in a hash, I really would like to know it.
Many thanks in advance, beth
Re: What is the most efficient way to see if a hash is empty?
by moritz (Cardinal) on Apr 28, 2009 at 08:14 UTC
|
What's wrong with if (%hash) { .. }?
Also note that scalar %hash can be evaluated in boolean context correctly (and doesn't traverse the hash internally), see perldata (iirc) | [reply] [d/l] [select] |
|
From perldata: "If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash."..
Am I right to assume that if the hash is non-empty if (%hash) and scalar(%hash) are simply returning a stored number of buckets (no loops to count buckets)? If so, that would be exactly want I wanted - O(1). I missed that paragraph. Thanks.
Best, beth
Update: just to be clear - this is meant to be an internals question, not a "trust the docs" question. perlguts doesn't go that deep into the internals of hash implementation.
| [reply] [d/l] [select] |
|
use strict;
use warnings;
use Benchmark qw(cmpthese);
for my $len (2..6) {
our (%full, %empty);
my $a = 'A';
$full{$a++} = $a while length($a) < $len;
print "\nLength $len with ", scalar(%full), "items\n";
cmpthese -1, {
full => sub {die unless %full},
empty => sub {die if %empty},
}
}
__END__
Length 2 with 22/32items
Rate full empty
full 1286220/s -- -77%
empty 5481192/s 326% --
Length 3 with 510/1024items
Rate full empty
full 1052183/s -- -82%
empty 5748771/s 446% --
Length 4 with 13961/32768items
Rate full empty
full 998734/s -- -84%
empty 6056132/s 506% --
Length 5 with 310721/524288items
Rate full empty
full 963764/s -- -83%
empty 5681139/s 489% --
Length 6 with 8655839/16777216items
Rate full empty
full 849541/s -- -86%
empty 6124859/s 621% --
Note that the number of hash items grows exponentially, while the number of iterations decreases linearly at best. | [reply] [d/l] |
|
C:\> perl -MDevel::Peek -e"Dump { }
SV = RV(0x183cfc8) at 0x226074
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x225f84
SV = PVHV(0x22b504) at 0x225f84
REFCNT = 1
FLAGS = (SHAREKEYS)
IV = 0 <<<------------------------------------------
NV = 0
ARRAY = 0x0
KEYS = 0 <<<------------------------------------------
FILL = 0
MAX = 7
RITER = -1
EITER = 0x0
C:\>perl -MDevel::Peek -e"Dump { 1..10 }
SV = RV(0x183cfc8) at 0x182f4f4
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x225f94
SV = PVHV(0x22b514) at 0x225f94
REFCNT = 1
FLAGS = (SHAREKEYS)
IV = 5 <<<------------------------------------------
NV = 0
ARRAY = 0x1825554 (0:3, 1:5)
hash quality = 150.0%
KEYS = 5 <<<------------------------------------------
FILL = 5
MAX = 7
RITER = -1
EITER = 0x0
Elt "1" HASH = 0x806b80c9
SV = IV(0x1828ce0) at 0x226084
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 2
Elt "3" HASH = 0xa400c7f3
SV = IV(0x1828d04) at 0x2260b4
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 4
Elt "7" HASH = 0xecc9d984
SV = IV(0x1828d14) at 0x2260f0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 8
| [reply] [d/l] |
|
|
|
|
|
No, the documentation is all lies :)
C:\>perl -e"die scalar %hash
0 at -e line 1.
C:\>perl -e"die scalar %ENV
26/64 at -e line 1.
| [reply] [d/l] |
Re: What is the most efficient way to see if a hash is empty?
by BrowserUk (Patriarch) on Apr 28, 2009 at 08:29 UTC
|
[0] Perl> %a = 1..1e6;;
[0] Perl> %b = ();;
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 1000 ],
B=>q[ scalar keys %b and ++$j for 1 .. 1000 ]
};;
Rate A B
A 8354/s -- -15%
B 9845/s 18% --
[0] Perl> %b = (1..2e6);;
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 1000 ],
B=>q[ scalar keys %b and ++$j for 1 .. 1000 ]
};;
Rate A B
A 821/s -- -0%
B 824/s 0% --
[0] Perl> %a = ();;
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 1000 ],
B=>q[ scalar keys %b and ++$j for 1 .. 1000 ]
};;
Rate B A
B 815/s -- -18%
A 998/s 23% --
[0] Perl> %a = (1,2);;
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 1000 ],
B=>q[ scalar keys %b and ++$j for 1 .. 1000 ]
};;
Rate A B
A 843/s -- -1%
B 849/s 1% --
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 1000 ],
B=>q[ scalar keys %b and ++$j for 1 .. 1000 ]
};;
Rate B A
B 804/s -- -3%
A 831/s 3% --
I'll let you draw your own conclusions, but mine are that if the hash is completely empty it is slightly faster than if it has any keys. But whether it has 1 key or 1 million make no difference at all.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
[0] Perl> %a = 1..2e6; %b = 1..2e6;;
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 10000 ],
B=>q[ scalar keys %b for 1 .. 10000 ]
};;
Rate A B
A 804/s -- -26%
B 1087/s 35% --
A problem I've been bitten by with other useless statements in void contexts.
And because it is far cheaper than an assignment:
[0] Perl> cmpthese -1, {
A=>q[ scalar keys %a and ++$i for 1 .. 10000 ],
B=>q[ my $x = scalar keys %b for 1 .. 10000 ]
};;
Rate B A
B 658/s -- -19%
A 812/s 24% --
and therefore less of a bias to the test.
Do you have a better way to deal with these issues?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Re: What is the most efficient way to see if a hash is empty?
by planetscape (Chancellor) on Apr 28, 2009 at 22:34 UTC
|
| [reply] |
Re: What is the most efficient way to see if a hash is empty?
by Anonymous Monk on Apr 28, 2009 at 18:26 UTC
|
if (each %hash) {
}
seems to have the most stable performance when comparing a mix of empty, filled and mega filled hashes on my computer.
| [reply] |
|
Here's why each is not a good idea:
Suppose you were to use each? Each call to each remembers its position within a hash. Only the first call would actually check to see if the hash was empty. The second and later calls would only be checking to see if there was a next element. So if statement #2 would return false if empty or if there were only one element. If statement #3 would return false there were 0,1, or 2 elements.
But the problems run even deeper. Any function inside the conditional block will miss the first key-value pair if it tries to use a while each loop . Bugs like this are hard to track down because the confused while loop may not be in your own code - it could be in a third party subroutine! Code like this:
use strict;
use warnings;
sub printAll;
my %h=(a=>1, b=>2, c=>3);
my @aKeys = keys %h;
print "full set of keys: <@aKeys>\n";
print "print all key-value pairs:\n";
printAll(\%h) if (each %h);
# subroutine using while each loop
# possibly buried somewhere deep in a third party module
sub printAll {
my $h=shift;
while (my ($k, $v) = each(%$h)) {
print "$k, $v\n";
}
}
outputs
full set of keys: <c a b>
print all key-value pairs:
a, 1
b, 2
#whoops - no c, 3 printed out!!!
Best, beth | [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
|