performance of counting keys in a big hash

xufengnju has asked for the wisdom of the Perl Monks concerning the following question:

when we use scalar keys %MyHash to get count of keys in a hash, does perl actually go through every key? I mean if it is a big hash having more than 100,000 keys, does it slow down?

Comment on performance of counting keys in a big hash

Replies are listed 'Best First'.
Re: performance of counting keys in a big hash by demerphq (Chancellor) on May 22, 2013 at 09:14 UTC
Perl stores the number of keys in the hash as an independent property. So accessing this data is always O(1). Note that this is not true of code that does `if (%hash){}` or otherwise inspects the fill of a hash using `scalar %hash` (note the omission of the keys() function). Such code, depending on the version, will be inefficient. In later perls the if() case is special cased to not use "fill", and in 5.20 `scalar %hash` will likely be equivalent to `scalar keys %hash`. --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re: performance of counting keys in a big hash by hdb (Monsignor) on May 22, 2013 at 09:16 UTC
The documentation of keys says: "In scalar context, returns the number of keys or indices." Based on this I would say that keys has a shortcut to return the number of keys w/o building an array first. A little experiment seems to confirm this: `use strict; use warnings; use Time::HiRes qw/time/; my %hash; my $n = 1000000; @hash{1..$n} = 1 x $n; my $s = time(); my @k = keys %hash; print scalar @k, ": "; print time()-$s, "\n"; $s = time(); print scalar keys %hash, ": "; print time()-$s, "\n";` [download] The second version is vastly faster.	[reply] [d/l]
Re^2: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 11:54 UTC
A good piece of research, however the possibility exists that the two results are related and generating the first one also pre-generates the second one. I suggest that tests like these are best done in separate methods on separate data just to be sure. `#!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all) ; my $n = 100000; our %hash1; @hash1{1..$n} = 1 x $n; my $n2 = 100000; our %hash2; @hash2{1..$n2} = 1 x $n2; my $n3 = 1000000; our %hash3; @hash3{1..$n3} = 1 x $n3; cmpthese( 1000, { 'Method 1' => sub { my @k = keys %hash1; my $v = scalar @k; }, '100K' => sub { my $v = scalar keys %hash2; }, '1 Mill' => sub { my $v = scalar keys %hash3; } } );` [download] Results: `Rate Method 1 1 Mill + 100K Method 1 15.0/s -- -100% + -100% 1 Mill 999999999999999872/s 6645700000000000000% -- + 0% 100K 999999999999999872/s 6645700000000000000% 0% + --` [download] Note result appears to be same for 100k and 1Million entries in hash. If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply] [d/l] [select]
Re^3: performance of counting keys in a big hash by hdb (Monsignor) on May 22, 2013 at 12:13 UTC
This is a valuable comment, I had not thought of that. I have reversed the order and get the same results. Does this provide enough proof or could there be a caveat as well? To be really sure one would have to run separate scripts?	[reply]
Re^4: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 12:19 UTC
Re^2: performance of counting keys in a big hash by demerphq (Chancellor) on May 22, 2013 at 13:16 UTC
Use the force Luke! `/* hash structure: / / This structure must match the beginning of struct xpvmg in sv.h. / struct xpvhv { HV xmg_stash; /* class package / union _xmgu xmg_u; STRLEN xhv_keys; / total keys, including placeholders +/ STRLEN xhv_max; / subscript of last element of xhv_ar +ray / };` [download] and / the number of keys (including any placeholders) / #define XHvTOTALKEYS(xhv) ((xhv)->xhv_keys) / * HvKEYS gets the number of keys that actually exist(), and is provid +ed * for backwards compatibility with old XS code. The core uses HvUSEDK +EYS * (keys, excluding placeholders) and HvTOTALKEYS (including placehold +ers) / #define HvKEYS(hv) HvUSEDKEYS(hv) #define HvUSEDKEYS(hv) (HvTOTALKEYS(hv) - HvPLACEHOLDERS_get( +hv)) #define HvTOTALKEYS(hv) XHvTOTALKEYS((XPVHV) SvANY(hv)) #define HvPLACEHOLDERS(hv) (Perl_hv_placeholders_p(aTHX_ MUTABLE +_HV(hv))) #define HvPLACEHOLDERS_get(hv) (SvMAGIC(hv) ? Perl_hv_placeholders_ge +t(aTHX_ (const HV )hv) : 0) [download] HvKEYS(hv) is the thing that implements `scalar keys %hash` --- $world=~s/war/peace/g	[reply] [d/l] [select]
Re^3: performance of counting keys in a big hash by hdb (Monsignor) on May 22, 2013 at 13:23 UTC
I think I will have to carry you around the swamps of some unknown planet for much longer before I will ready to dive into Perl's source itself to answer questions in this community...	[reply]
Re^4: performance of counting keys in a big hash by demerphq (Chancellor) on May 22, 2013 at 13:44 UTC
Re^3: performance of counting keys in a big hash by space_monk (Chaplain) on May 22, 2013 at 14:55 UTC
Dammit Jim, I'm a doctor, not a C programmer! :-) _{ok it's a little cross genre, but what the hell} If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)	[reply]

Back to Seekers of Perl Wisdom