Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

performance of counting keys in a big hash

by xufengnju (Initiate)
on May 22, 2013 at 08:58 UTC ( #1034706=perlquestion: print w/replies, xml ) Need Help??
xufengnju has asked for the wisdom of the Perl Monks concerning the following question:

when we use scalar keys %MyHash to get count of keys in a hash, does perl actually go through every key? I mean if it is a big hash having more than 100,000 keys, does it slow down?
  • Comment on performance of counting keys in a big hash

Replies are listed 'Best First'.
Re: performance of counting keys in a big hash
by demerphq (Chancellor) on May 22, 2013 at 09:14 UTC

    Perl stores the number of keys in the hash as an independent property. So accessing this data is always O(1).

    Note that this is not true of code that does if (%hash){} or otherwise inspects the *fill* of a hash using scalar %hash (note the omission of the keys() function). Such code, depending on the version, will be inefficient. In later perls the if() case is special cased to not use "fill", and in 5.20 scalar %hash will likely be equivalent to scalar keys %hash.


Re: performance of counting keys in a big hash
by hdb (Prior) on May 22, 2013 at 09:16 UTC

    The documentation of keys says: "In scalar context, returns the number of keys or indices." Based on this I would say that keys has a shortcut to return the number of keys w/o building an array first. A little experiment seems to confirm this:

    use strict; use warnings; use Time::HiRes qw/time/; my %hash; my $n = 1000000; @hash{1..$n} = 1 x $n; my $s = time(); my @k = keys %hash; print scalar @k, ": "; print time()-$s, "\n"; $s = time(); print scalar keys %hash, ": "; print time()-$s, "\n";

    The second version is vastly faster.

      A good piece of research, however the possibility exists that the two results are related and generating the first one also pre-generates the second one. I suggest that tests like these are best done in separate methods on separate data just to be sure.

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; my $n = 100000; our %hash1; @hash1{1..$n} = 1 x $n; my $n2 = 100000; our %hash2; @hash2{1..$n2} = 1 x $n2; my $n3 = 1000000; our %hash3; @hash3{1..$n3} = 1 x $n3; cmpthese( 1000, { 'Method 1' => sub { my @k = keys %hash1; my $v = scalar @k; }, '100K' => sub { my $v = scalar keys %hash2; }, '1 Mill' => sub { my $v = scalar keys %hash3; } } );
      Rate Method 1 1 Mill + 100K Method 1 15.0/s -- -100% + -100% 1 Mill 999999999999999872/s 6645700000000000000% -- + 0% 100K 999999999999999872/s 6645700000000000000% 0% + --
      Note result appears to be same for 100k and 1Million entries in hash.
      If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

        This is a valuable comment, I had not thought of that. I have reversed the order and get the same results. Does this provide enough proof or could there be a caveat as well?

        To be really sure one would have to run separate scripts?

      Use the force Luke!

      /* hash structure: */ /* This structure must match the beginning of struct xpvmg in sv.h. */ struct xpvhv { HV* xmg_stash; /* class package */ union _xmgu xmg_u; STRLEN xhv_keys; /* total keys, including placeholders +*/ STRLEN xhv_max; /* subscript of last element of xhv_ar +ray */ };


      /* the number of keys (including any placeholders) */ #define XHvTOTALKEYS(xhv) ((xhv)->xhv_keys) /* * HvKEYS gets the number of keys that actually exist(), and is provid +ed * for backwards compatibility with old XS code. The core uses HvUSEDK +EYS * (keys, excluding placeholders) and HvTOTALKEYS (including placehold +ers) */ #define HvKEYS(hv) HvUSEDKEYS(hv) #define HvUSEDKEYS(hv) (HvTOTALKEYS(hv) - HvPLACEHOLDERS_get( +hv)) #define HvTOTALKEYS(hv) XHvTOTALKEYS((XPVHV*) SvANY(hv)) #define HvPLACEHOLDERS(hv) (*Perl_hv_placeholders_p(aTHX_ MUTABLE +_HV(hv))) #define HvPLACEHOLDERS_get(hv) (SvMAGIC(hv) ? Perl_hv_placeholders_ge +t(aTHX_ (const HV *)hv) : 0)

      HvKEYS(hv) is the thing that implements scalar keys %hash


        I think I will have to carry you around the swamps of some unknown planet for much longer before I will ready to dive into Perl's source itself to answer questions in this community...

        Dammit Jim, I'm a doctor, not a C programmer! :-) ok it's a little cross genre, but what the hell

        If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1034706]
Approved by vinoth.ree
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2017-02-24 01:59 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (351 votes). Check out past polls.