The stupid question is the question not asked PerlMonks

### Calculating Hamming distance (binary)

by rootcho (Pilgrim)
 on Aug 11, 2013 at 01:09 UTC Need Help??
rootcho has asked for the wisdom of the Perl Monks concerning the following question:

Any ideas for a better and faster function to calculate Hamming distance. It will be good if I can deduce the bits count.
```#Hamming distance
sub hd {
my (\$d1,\$d2,\$bits) = @_;
\$bits ||= 8;
my \$diff = \$d1 ^ \$d2;
my \$count = 0;
for my \$b ( 0 .. \$bits-1 ) {
my \$mask = 1 << \$b;
++ \$count if \$diff & \$mask
}
return \$count
}
thanks

Replies are listed 'Best First'.
Re: Calculating Hamming distance (binary)
by farang (Chaplain) on Aug 11, 2013 at 01:44 UTC
Re: Calculating Hamming distance (binary)
by kcott (Chancellor) on Aug 11, 2013 at 05:02 UTC

G'day rootcho,

There may be a better algorithm; however, just rearranging your code, without changing the basic logic, produced significant speed improvements: more than 30% for 8 bits and almost 50% for 64 bits. Here's my version of your subroutine:

```sub hd_Ken {
my (\$diff, \$bits, \$count, \$mask) = (\$_[0] ^ \$_[1], \$_[2] || 8, 0);

\$mask = 1 << \$_, \$diff & \$mask && ++\$count for 0 .. \$bits - 1;

\$count;
}

As you can see, I've eliminated a number of variables completely (\$d1, \$d2 and \$b) and declared the reminder only once; there's less assignments and the return has been removed. Here's my test and benchmark code:

```#!/usr/bin/env perl

use strict;
use warnings;

use Benchmark qw{cmpthese};

my \$x = 1 << 0;
my \$y = 1 << 1 ^ 1 << 4 ^ 1 << 7 ^ 1 << 15 ^ 1 << 31 ^ 1 << 63;

my @test_results = (
hd_OP(\$x, \$y),
hd_Ken(\$x, \$y),
hd_OP(\$x, \$y, 16),
hd_Ken(\$x, \$y, 16),
hd_OP(\$x, \$y, 32),
hd_Ken(\$x, \$y, 32),
hd_OP(\$x, \$y, 64),
hd_Ken(\$x, \$y, 64),
);
print "Test results: @test_results\n";

cmpthese(-1, {
hd_OP_8 => sub { hd_OP(\$x, \$y) },
hd_Ken_8 => sub { hd_Ken(\$x, \$y) },
});

cmpthese(-1, {
hd_OP_16 => sub { hd_OP(\$x, \$y, 16) },
hd_Ken_16 => sub { hd_Ken(\$x, \$y, 16) },
});

cmpthese(-1, {
hd_OP_32 => sub { hd_OP(\$x, \$y, 32) },
hd_Ken_32 => sub { hd_Ken(\$x, \$y, 32) },
});

cmpthese(-1, {
hd_OP_64 => sub { hd_OP(\$x, \$y, 64) },
hd_Ken_64 => sub { hd_Ken(\$x, \$y, 64) },
});

sub hd_Ken {
my (\$diff, \$bits, \$count, \$mask) = (\$_[0] ^ \$_[1], \$_[2] || 8, 0);

\$mask = 1 << \$_, \$diff & \$mask && ++\$count for 0 .. \$bits - 1;

\$count;
}

sub hd_OP {
my (\$d1,\$d2,\$bits) = @_;
\$bits ||= 8;
my \$diff = \$d1 ^ \$d2;
my \$count = 0;
for my \$b ( 0 .. \$bits-1 ) {
my \$mask = 1 << \$b;
++ \$count if \$diff & \$mask
}
return \$count
}

Obviously, you'll want to run that on your system. I usually run benchmarks several times to identify outliers; here's a fairly representative result:

```\$ pm_hamming.pl
Test results: 4 4 5 5 6 6 7 7
Rate  hd_OP_8 hd_Ken_8
hd_OP_8  344926/s       --     -24%
hd_Ken_8 454209/s      32%       --
Rate  hd_OP_16 hd_Ken_16
hd_OP_16  231849/s        --      -26%
hd_Ken_16 314139/s       35%        --
Rate  hd_OP_32 hd_Ken_32
hd_OP_32  139183/s        --      -30%
hd_Ken_32 198422/s       43%        --
Rate  hd_OP_64 hd_Ken_64
hd_OP_64   77282/s        --      -32%
hd_Ken_64 113551/s       47%        --

-- Ken

Re: Calculating Hamming distance (binary)
by jwkrahn (Monsignor) on Aug 11, 2013 at 07:01 UTC
```#Hamming distance
sub hd {
my (\$d1,\$d2,\$bits) = @_;
\$bits ||= 8;
my \$diff = \$d1 ^ \$d2;
my \$count = 0;
for my \$b ( 0 .. \$bits-1 ) {
my \$mask = 1 << \$b;
++ \$count if \$diff & \$mask
}
return \$count
}

When I run your subroutine I get the warning message:

Argument "^C" isn't numeric in bitwise and (&) at -e line XX.

That message means that \$diff is a string and \$mask is a number.    You can use the bit-wise operators like & on two strings or two numbers but not on a number and a string.

perl provides a way to count bits in a string using the unpack function.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048965]
Approved by davido
Front-paged by davido
help
Chatterbox?
 [stevieb]: "Happy Monk Day, you've been here 8 troublesome years". Tack on another 8 before I signed up :)

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2017-08-19 15:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (311 votes). Check out past polls.

Notices?