Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

What is the most efficient way to see if a hash is empty?

by ELISHEVA (Prior)
on Apr 28, 2009 at 08:08 UTC ( #760534=perlquestion: print w/ replies, xml ) Need Help??
ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:

Is there a more efficient and self-documenting way to see if a hash is empty than scalar(@{[%h]}) (or scalar(keys %h))?

Neither of these expressions seem to be particularly clear ways of saying "is this empty?". The first especially requires a fairly good understanding of hashes and references to even guess what it is doing. I can't seem to find any function or sub with a clearer-than-day name like is_hash_empty(%h). Am I missing something?

Additionally, both incantations seem like overkill. Checking for emptiness only requires a simple one-step operation to see if at least one key exists. On the other hand [%h] and keys %h copy elements to a new array and are O(N) rather than O(1)... unless, of course, Perl is doing some clever optimizations when it sees scalar(...).

Is it? Does Perl know how to optimize expressions like if (scalar(keys %h)) or scalar(@{[%h]}) && ... so that only one element is tested (or only a stored element count is extracted)? And if so, how sensitive is that optimization to variations in syntax, e.g. are things like scalar(foo()) optimized? And if so, how does "Perl" know it needs to optimize?

And if not, what alternative do you recommend? Using a single call to each(...) does not appear to be an option. Calls to each do not reset, so any subsequent sub that tried to walk through hash elements using each would start on the second key rather than the first. Probably not a good thing.

In the CB, some wise monk (my apologies - I can't remember who) - suggested that scalar(%h) might do it - it returns 0 when the array is empty. Is this always true? Is this an O(1) check for emptiness?

For very small hashes having an efficient way to test for emptiness is probably not an issue. However copying an entire hash could end up being very expensive if the hash is either empty or *very* large. And even with smaller hashes repeated O(N) tests add up quickly potentially turning an O(N) problem into O(N^2). So, if there is an O(1) way to test for emptiness in a hash, I really would like to know it.

Many thanks in advance, beth

Comment on What is the most efficient way to see if a hash is empty?
Select or Download Code
Re: What is the most efficient way to see if a hash is empty?
by moritz (Cardinal) on Apr 28, 2009 at 08:14 UTC
    What's wrong with if (%hash) { .. }?

    Also note that scalar %hash can be evaluated in boolean context correctly (and doesn't traverse the hash internally), see perldata (iirc)

      From perldata: "If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash."..

      Am I right to assume that if the hash is non-empty if (%hash) and scalar(%hash) are simply returning a stored number of buckets (no loops to count buckets)? If so, that would be exactly want I wanted - O(1). I missed that paragraph. Thanks.

      Best, beth

      Update: just to be clear - this is meant to be an internals question, not a "trust the docs" question. perlguts doesn't go that deep into the internals of hash implementation.

        No, the documentation is all lies :)
        C:\>perl -e"die scalar %hash 0 at -e line 1. C:\>perl -e"die scalar %ENV 26/64 at -e line 1.
        That's what I expect from common sense, yes.

        A quick benchmark seems to confirm this; asking an empty hash is much faster than asking a filled one, but the number of keys in the full hash seems to have no significant influence on the speed:

        use strict; use warnings; use Benchmark qw(cmpthese); for my $len (2..6) { our (%full, %empty); my $a = 'A'; $full{$a++} = $a while length($a) < $len; print "\nLength $len with ", scalar(%full), "items\n"; cmpthese -1, { full => sub {die unless %full}, empty => sub {die if %empty}, } } __END__ Length 2 with 22/32items Rate full empty full 1286220/s -- -77% empty 5481192/s 326% -- Length 3 with 510/1024items Rate full empty full 1052183/s -- -82% empty 5748771/s 446% -- Length 4 with 13961/32768items Rate full empty full 998734/s -- -84% empty 6056132/s 506% -- Length 5 with 310721/524288items Rate full empty full 963764/s -- -83% empty 5681139/s 489% -- Length 6 with 8655839/16777216items Rate full empty full 849541/s -- -86% empty 6124859/s 621% --

        Note that the number of hash items grows exponentially, while the number of iterations decreases linearly at best.

        Devel::Peek and a quick perusal of the sources seems back up the idea that its a lookup
        C:\> perl -MDevel::Peek -e"Dump { } SV = RV(0x183cfc8) at 0x226074 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x225f84 SV = PVHV(0x22b504) at 0x225f84 REFCNT = 1 FLAGS = (SHAREKEYS) IV = 0 <<<------------------------------------------ NV = 0 ARRAY = 0x0 KEYS = 0 <<<------------------------------------------ FILL = 0 MAX = 7 RITER = -1 EITER = 0x0 C:\>perl -MDevel::Peek -e"Dump { 1..10 } SV = RV(0x183cfc8) at 0x182f4f4 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x225f94 SV = PVHV(0x22b514) at 0x225f94 REFCNT = 1 FLAGS = (SHAREKEYS) IV = 5 <<<------------------------------------------ NV = 0 ARRAY = 0x1825554 (0:3, 1:5) hash quality = 150.0% KEYS = 5 <<<------------------------------------------ FILL = 5 MAX = 7 RITER = -1 EITER = 0x0 Elt "1" HASH = 0x806b80c9 SV = IV(0x1828ce0) at 0x226084 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 2 Elt "3" HASH = 0xa400c7f3 SV = IV(0x1828d04) at 0x2260b4 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 4 Elt "7" HASH = 0xecc9d984 SV = IV(0x1828d14) at 0x2260f0 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 8
Re: What is the most efficient way to see if a hash is empty?
by BrowserUk (Pope) on Apr 28, 2009 at 08:29 UTC

    Hm. scalar keys %hash does not seem to be O(N):

    [0] Perl> %a = 1..1e6;; [0] Perl> %b = ();; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 8354/s -- -15% B 9845/s 18% -- [0] Perl> %b = (1..2e6);; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 821/s -- -0% B 824/s 0% -- [0] Perl> %a = ();; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate B A B 815/s -- -18% A 998/s 23% -- [0] Perl> %a = (1,2);; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate A B A 843/s -- -1% B 849/s 1% -- [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 1000 ], B=>q[ scalar keys %b and ++$j for 1 .. 1000 ] };; Rate B A B 804/s -- -3% A 831/s 3% --

    I'll let you draw your own conclusions, but mine are that if the hash is completely empty it is slightly faster than if it has any keys. But whether it has 1 key or 1 million make no difference at all.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I'll let you draw your own conclusions

      I would conclude that your benchmark may be flawed since for empty hashes the $j++ operation is skipped. Maybe that's the reason for the difference?

        I put the and ++$i there to stop the scalar keys %hash being optimised away for being in a void context:

        [0] Perl> %a = 1..2e6; %b = 1..2e6;; [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ scalar keys %b for 1 .. 10000 ] };; Rate A B A 804/s -- -26% B 1087/s 35% --

        A problem I've been bitten by with other useless statements in void contexts.

        And because it is far cheaper than an assignment:

        [0] Perl> cmpthese -1, { A=>q[ scalar keys %a and ++$i for 1 .. 10000 ], B=>q[ my $x = scalar keys %b for 1 .. 10000 ] };; Rate B A B 658/s -- -19% A 812/s 24% --

        and therefore less of a bias to the test.

        Do you have a better way to deal with these issues?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: What is the most efficient way to see if a hash is empty?
by Anonymous Monk on Apr 28, 2009 at 18:26 UTC
    if (each %hash) { } seems to have the most stable performance when comparing a mix of empty, filled and mega filled hashes on my computer.

      That's what I suggested trying in the CB to ELISHEVA last night. :)

      Here's why each is not a good idea:

      Suppose you were to use each? Each call to each remembers its position within a hash. Only the first call would actually check to see if the hash was empty. The second and later calls would only be checking to see if there was a next element. So if statement #2 would return false if empty or if there were only one element. If statement #3 would return false there were 0,1, or 2 elements.

      But the problems run even deeper. Any function inside the conditional block will miss the first key-value pair if it tries to use a while each loop . Bugs like this are hard to track down because the confused while loop may not be in your own code - it could be in a third party subroutine! Code like this:

      use strict; use warnings; sub printAll; my %h=(a=>1, b=>2, c=>3); my @aKeys = keys %h; print "full set of keys: <@aKeys>\n"; print "print all key-value pairs:\n"; printAll(\%h) if (each %h); # subroutine using while each loop # possibly buried somewhere deep in a third party module sub printAll { my $h=shift; while (my ($k, $v) = each(%$h)) { print "$k, $v\n"; } }

      outputs

      full set of keys: <c a b> print all key-value pairs: a, 1 b, 2 #whoops - no c, 3 printed out!!!

      Best, beth

        Good call++.

Re: What is the most efficient way to see if a hash is empty?
by planetscape (Canon) on Apr 28, 2009 at 22:34 UTC
    In the CB, some wise monk (my apologies - I can't remember who) - suggested that scalar(%h) might do it

    I believe that very wise Monk was our own CountZero. :-)

    HTH,

    planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://760534]
Approved by moritz
Front-paged by citromatik
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2014-09-16 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (10 votes), past polls