Your skill will accomplishwhat the force of many cannot PerlMonks

How to do popcount (aka Hamming weight) in Perl

by eyepopslikeamosquito (Chancellor)
 on Sep 24, 2017 at 10:23 UTC Need Help??
eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

In More Betterer Game of Life, I needed a popcount (aka Hamming weight) function to sum the one bits in a 64-bit value.

I started with the basic popcount1 below, scraped with little thought from the Hamming_weight wikipedia page.

I'd like to improve that, hence this node.

```use strict;
use warnings;
use Benchmark qw(timethese);

sub popcount1 {
my \$x = shift;
my \$count;
for (\$count = 0; \$x; ++\$count) { \$x &= \$x - 1 }
return \$count;
}

sub popcount2 {
return sprintf('%b', shift) =~ tr/1//;
}

# Update: unpack('%b*', pack('J', shift)) is better (see Rosetta ref.
+below)
# This works but is slower: unpack('b*', pack('J', shift)) =~ tr/1//;
sub popcount3 {
return unpack('%32b*', pack('Q', shift));
}

my \$start = 2**32 - 42;
my \$end   = \$start + 1000000;

print "sanity test for correctness\n";
for my \$i (0 .. 256, \$start .. \$end) {
my \$one = popcount1(\$i);
my \$two = popcount2(\$i);
my \$three = popcount3(\$i);
# print "\$i: \$one \$two \$three\n";
\$one == \$two or die;
\$one == \$three or die;
}

timethese 50, {
One   => sub { popcount1(\$_) for \$start .. \$end },
Two   => sub { popcount2(\$_) for \$start .. \$end },
Three => sub { popcount3(\$_) for \$start .. \$end },
};

Running the above program on my machine produced:

```sanity test for correctness
Benchmark: timing 50 iterations of One, Three, Two...
One: 29 wallclock secs (28.41 usr +  0.00 sys = 28.41 CPU) @  1.76/
+s (n=50)
Three: 10 wallclock secs (10.00 usr +  0.00 sys = 10.00 CPU) @  5.00/
+s (n=50)
Two: 10 wallclock secs (10.03 usr +  0.00 sys = 10.03 CPU) @  4.98/
+s (n=50)

Improvements welcome.

References

Updated: Added more references (thanks oiskuu)

Replies are listed 'Best First'.
Re: How to do popcount (aka Hamming weight) in Perl
by BrowserUk (Pope) on Sep 24, 2017 at 14:51 UTC

This (labelled "six" ) is the fastest C/Perl method I found when I was looking a (long) while ago) that works on chips that don't have a popcnt instruction:

```#! perl -slw
use strict; package SparseBitVector; use Config;
use Inline C => Config => BUILD_NOISY => 1;
use Inline C => <<'END_C',  NAME => 'Junk', CLEAN_AFTER_BUILD =>0;

typedef unsigned __int64 U64;
typedef          __int64 I64;

unsigned popcnt( SV *sv ) {
U64 x = (U64) SvUVX( sv );
x -=( x >> 1 ) & 0x5555555555555555ULL;
x = ( x & 0x3333333333333333ULL ) + ( ( x >> 2 ) & 0x3333333333333
+333ULL );
x = ( x + (x >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
return (unsigned)( ( x * 0x0101010101010101ull ) >> 56 );
}

END_C

use Benchmark qw( cmpthese );

sub popcount1 {
my \$x = shift;
my \$count;
for (\$count = 0; \$x; ++\$count) { \$x &= \$x - 1 }
return \$count;
}

sub popcount2 {
return sprintf('%b', shift) =~ tr/1//;
}

sub popcount3 {
return unpack('%32b*', pack('Q', shift));
}

sub popcount4 {
return unpack( '%32b*', pack('Q', \$_[0] ) );
}

my \$start = 2**32 - 42;
my \$end   = \$start + 1000000;

print "sanity test for correctness\n";
for my \$i (0 .. 256, \$start .. \$end) {
my \$two = popcount2(\$i);
my \$three = popcount3(\$i);
my \$four = popcount4( \$i );
my \$six = popcnt( \$i );
\$three == \$two or print "\$i: \$two \$three\n" and die;
\$four == \$six and \$three == \$six or print "\$i: 2:\$two 3:\$three 6:\$s
+ix\n" and die;
}

cmpthese -3, {
#   One   => sub { popcount1(\$_) for \$start .. \$end },
Two   => sub { popcount2(\$_) for \$start .. \$end },
Three => sub { popcount3(\$_) for \$start .. \$end },
Four  => sub { popcount4(\$_) for \$start .. \$end },
Six   => sub { popcnt(   \$_) for \$start .. \$end },
};

Outputs:

```C:\test>1199987.pl
sanity test for correctness

Rate   Two Three  Four   Six
Two   1.02/s    --   -4%   -5%  -58%
Three 1.06/s    4%    --   -1%  -57%
Four  1.08/s    5%    1%    --  -56%
Six   2.45/s  140%  131%  128%    --

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

If needed, Inline C code supporting 64 and 32 bits.

```use strict;
use warnings;

use Inline 'C' => Config => BUILD_NOISY => 1, CLEAN_AFTER_BUILD => 0;
use Inline 'C' => <<'END_C';

#include <stdint.h>

unsigned popcnt2( SV *sv ) {
unsigned count = 0;

#ifdef __LP64__
static const uint64_t m1  = UINT64_C(0x5555555555555555);
static const uint64_t m2  = UINT64_C(0x3333333333333333);
static const uint64_t m4  = UINT64_C(0x0f0f0f0f0f0f0f0f);
static const uint64_t h01 = UINT64_C(0x0101010101010101);

uint64_t x = (uint64_t) SvUVX( sv );

x =  x       - ((x >> 1)  & m1);
x = (x & m2) + ((x >> 2)  & m2);
x = (x       +  (x >> 4)) & m4;

count += (unsigned) ((x * h01) >> 56);
#else
static const uint32_t m1  = UINT32_C(0x55555555);
static const uint32_t m2  = UINT32_C(0x33333333);
static const uint32_t m4  = UINT32_C(0x0f0f0f0f);
static const uint32_t h01 = UINT32_C(0x01010101);

uint32_t x = (uint32_t) SvUVX( sv );

x =  x       - ((x >> 1)  & m1);
x = (x & m2) + ((x >> 2)  & m2);
x = (x       +  (x >> 4)) & m4;

count += (unsigned) ((x * h01) >> 24);
#endif

return count;
}

END_C

use Benchmark qw( cmpthese );

my \$start = 2 ** 32 - 42;
my \$end   = \$start + 1000000;

cmpthese -3, {
Six2 => sub { popcnt2(\$_) for \$start .. \$end },
};

Regards, Mario

Re: How to do popcount (aka Hamming weight) in Perl
by Athanasius (Bishop) on Sep 24, 2017 at 12:54 UTC

Hello eyepopslikeamosquito,

The CPAN module Bit::Fast has a function popcountl, implemented in C, which operates on 64-bit integers and is claimed to be fast.

Disclaimer: I haven’t been able to test it because — although I have a 64-bit OS and a 64-bit Perl with USE_64_BIT_INT defined — my Perl was apparently built with LONGSIZE defined as 4, not 8, so (with use Config) \$Config{longsize} == 4 and popcountl doesn’t get built. I confess I don’t understand the issues involved here — maybe just a configuration problem?

Anyway, hope that’s of interest,

 Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Re: How to do popcount (aka Hamming weight) in Perl
by marioroy (Priest) on Sep 24, 2017 at 16:12 UTC

Update 2: Thanks Dana, the popcnt function moved to util.h in Math::Prime::Util v0.62.

Update 1: Added links to popcount.cpp and popcnt. Dana replaced popcnt with mpu_popcount_string in Math::Prime::Util v0.62.

Found in mce-sandbox/src/bits.h and used here, I received help by reading popcount.cpp from primesieve.org and util.c (popcnt) from Math::Prime::Util <= v0.61. Although the following is tailored for counting set bits inside a string, I'm sharing the code to show-case 64bits and 32bits support inside a function, determined by the __LP64__ pragma.

```#ifndef BITS_H
#define BITS_H

#include <stdint.h>

typedef unsigned char byte_t;

static const int popcnt_byte[256] = {
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
};

static uint64_t popcount(const byte_t *bytearray, uint64_t size)
{
uint64_t asize, i, count = 0;

if (bytearray == 0 || size == 0)
return count;

if (size > 8) {
#ifdef __LP64__
static const uint64_t m1  = UINT64_C(0x5555555555555555);
static const uint64_t m2  = UINT64_C(0x3333333333333333);
static const uint64_t m4  = UINT64_C(0x0f0f0f0f0f0f0f0f);
static const uint64_t h01 = UINT64_C(0x0101010101010101);

const uint64_t *a = (uint64_t *) bytearray;

asize = (size + 7) / 8 - 1;

for (i = 0; i < asize; i++) {
uint64_t b = a[i];
b =  b       - ((b >> 1)  & m1);
b = (b & m2) + ((b >> 2)  & m2);
b = (b       +  (b >> 4)) & m4;
count += (b * h01) >> 56;
}

i = asize * 8;

#else
static const uint32_t m1  = UINT32_C(0x55555555);
static const uint32_t m2  = UINT32_C(0x33333333);
static const uint32_t m4  = UINT32_C(0x0f0f0f0f);
static const uint32_t h01 = UINT32_C(0x01010101);

const uint32_t *a = (uint32_t *) bytearray;

asize = (size + 3) / 4 - 1;

for (i = 0; i < asize; i++) {
uint32_t b = a[i];
b =  b       - ((b >> 1)  & m1);
b = (b & m2) + ((b >> 2)  & m2);
b = (b       +  (b >> 4)) & m4;
count += (b * h01) >> 24;
}

i = asize * 4;

#endif
}
else
i = 0;

for (; i < size; i++)
count += popcnt_byte[bytearray[i]];

return count;
}

#endif

Regards, Mario

Dana replaced popcnt with mpu_popcount_string in Math::Prime::Util v0.62.

I moved the native popcnt to util.h. mpu_popcount_string is a new function to handle bigints or sufficiently magic input without using a bigint library. It's faster than getting Math::BigInt involved until the numbers get over 500 digits. It should be optimized, but it's not exactly the common case.

It's also useful for making sure we don't get really slow for 64-bit numbers on a 32-bit Perl. Not as fast as forcing 64-bit code, but that's further narrowing down the space -- people on 64-bit machines who install a 32-bit Perl.

Using Math::BigInt to handle a long digit string would be something like:

```use Math::BigInt;
my \$n = "16" x 1000;
say 0 + (Math::BigInt->new("\$n")->as_bin() =~ tr/1//);
where only the last part is needed if it's already a bigint. Math::GMPz has Rmpz_popcount which is ridiculously fast if the input is already a Math::GMPz object.

Re: How to do popcount (aka Hamming weight) in Perl
by LanX (Archbishop) on Sep 24, 2017 at 10:37 UTC
> Improvements welcome.

Performance wise?

I think a pre computed lookup table is the way I'd go.

An array with 2**16 entries should be a good compromise between time and memory complexity.

(Since Perl doesn't fit into a CPU's line cache)

Sorry too busy to code it for you :)

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Re: How to do popcount (aka Hamming weight) in Perl
by LanX (Archbishop) on Sep 24, 2017 at 12:58 UTC
> unpack('%32b*'

so Perl has already a specialized command for this! (cool)

but why do you need

> pack('Q'

then?

If it's some kind of normalization, why do you need to benchmark it too instead of keeping the data in that format?

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Re: How to do popcount (aka Hamming weight) in Perl
by danaj (Friar) on Sep 26, 2017 at 19:12 UTC

There are a fair number of modules that do this. For instance, RosettaCode shows:

```use ntheory qw/hammingweight/;
say hammingweight(1234567);

use Math::GMPz qw/Rmpz_popcount/;
say Rmpz_popcount(Math::GMPz->new(1234567));

use Math::BigInt;
say 0 + (Math::BigInt->new(1234567)->as_bin() =~ tr/1//);

use Bit::Vector;
say Bit::Vector->new_Dec(64,1234567)->Norm;

The speed of the underlying C code is important if you're calling the function from C where you can see the difference between a few cycles. Once you add Perl overhead it's somewhat less obvious. ntheory for example adds an input validation layer when called from Perl which means it croaks if you give it undef or "hello" or a strange object reference. It recognizes bigints and gives you the correct answer. Bit::Fast and BrowserUK's Inline::C code have no input validation -- it coerces the input into a C integer type and proceeds whether that makes any sense or not. There are many cases validation is just wasted time so it's a tradeoff. ntheory's C code is as fast or faster than BrowserUK's code on 64-bit machines (it uses the popcount asm if possible, otherwise identical C code), Bit::Fast's C code is either the same speed as ntheory, the same speed but with incorrect results, slightly slower, or infinitely slower due to a segfault or inability to compile, depending on the C compiler.

Once we start calling from Perl, the differences in overhead become more apparent. Note that BrowserUK's benchmark as posted greatly favors his Inline::C function, as all the others get an added Perl subroutine call added to them, and Perl sub calls are not at all cheap compared to this function. Correcting that, and adding some modules, I get:

```                Rate bigint  GMPz bitvector Three Four  Two ntheory Inline raw_nt bitfast
bigint    4.56e-02/s     --  -93%      -94%  -99% -99% -99%   -100%  -100%  -100%   -100%
GMPz         0.667/s  1361%    --      -18%  -85% -85% -86%    -93%   -95%   -96%    -96%
bitvector    0.811/s  1676%   22%        --  -81% -82% -83%    -92%   -93%   -95%    -95%
Three         4.32/s  9363%  548%      433%    --  -4% -10%    -57%   -65%   -71%    -71%
Four          4.50/s  9763%  575%      455%    4%   --  -6%    -56%   -64%   -70%    -70%
Two           4.81/s 10434%  621%      493%   11%   7%   --    -53%   -61%   -68%    -68%
ntheory       10.1/s 22087% 1419%     1149%  134% 125% 111%      --   -18%   -32%    -33%
Inline        12.4/s 27108% 1763%     1432%  188% 176% 158%     23%     --   -17%    -18%
raw_nt        15.0/s 32729% 2148%     1748%  247% 233% 212%     48%    21%     --     -1%
bitfast       15.1/s 33012% 2167%     1764%  250% 236% 214%     49%    22%     1%      --
```

Some of these functions have more features than others, e.g. Math::GMPz, bigint, and ntheory will handle bigints correctly. raw_nt is ntheory with a new trivial XS function added that just takes a UV and returns the result of popcnt -- no validation or input size checks, just like Bit::Fast or the Inline C function. Not surprisingly, it comes out the same speed on this machine (both use gcc with asm popcnt support). We are talking about an instruction with 1 cycle latency on newer processors.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://1199987]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2019-04-25 20:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
I am most likely to install a new module from CPAN if:

Results (127 votes). Check out past polls.

Notices?