Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Calculating Hamming distance (binary)

by rootcho (Pilgrim)
on Aug 11, 2013 at 01:09 UTC ( #1048965=perlquestion: print w/ replies, xml ) Need Help??
rootcho has asked for the wisdom of the Perl Monks concerning the following question:

Any ideas for a better and faster function to calculate Hamming distance. It will be good if I can deduce the bits count.
#Hamming distance sub hd { my ($d1,$d2,$bits) = @_; $bits ||= 8; my $diff = $d1 ^ $d2; my $count = 0; for my $b ( 0 .. $bits-1 ) { my $mask = 1 << $b; ++ $count if $diff & $mask } return $count }
thanks

Comment on Calculating Hamming distance (binary)
Download Code
Replies are listed 'Best First'.
Re: Calculating Hamming distance (binary)
by farang (Chaplain) on Aug 11, 2013 at 01:44 UTC
Re: Calculating Hamming distance (binary)
by kcott (Abbot) on Aug 11, 2013 at 05:02 UTC

    G'day rootcho,

    There may be a better algorithm; however, just rearranging your code, without changing the basic logic, produced significant speed improvements: more than 30% for 8 bits and almost 50% for 64 bits. Here's my version of your subroutine:

    sub hd_Ken { my ($diff, $bits, $count, $mask) = ($_[0] ^ $_[1], $_[2] || 8, 0); $mask = 1 << $_, $diff & $mask && ++$count for 0 .. $bits - 1; $count; }

    As you can see, I've eliminated a number of variables completely ($d1, $d2 and $b) and declared the reminder only once; there's less assignments and the return has been removed. Here's my test and benchmark code:

    #!/usr/bin/env perl use strict; use warnings; use Benchmark qw{cmpthese}; my $x = 1 << 0; my $y = 1 << 1 ^ 1 << 4 ^ 1 << 7 ^ 1 << 15 ^ 1 << 31 ^ 1 << 63; my @test_results = ( hd_OP($x, $y), hd_Ken($x, $y), hd_OP($x, $y, 16), hd_Ken($x, $y, 16), hd_OP($x, $y, 32), hd_Ken($x, $y, 32), hd_OP($x, $y, 64), hd_Ken($x, $y, 64), ); print "Test results: @test_results\n"; cmpthese(-1, { hd_OP_8 => sub { hd_OP($x, $y) }, hd_Ken_8 => sub { hd_Ken($x, $y) }, }); cmpthese(-1, { hd_OP_16 => sub { hd_OP($x, $y, 16) }, hd_Ken_16 => sub { hd_Ken($x, $y, 16) }, }); cmpthese(-1, { hd_OP_32 => sub { hd_OP($x, $y, 32) }, hd_Ken_32 => sub { hd_Ken($x, $y, 32) }, }); cmpthese(-1, { hd_OP_64 => sub { hd_OP($x, $y, 64) }, hd_Ken_64 => sub { hd_Ken($x, $y, 64) }, }); sub hd_Ken { my ($diff, $bits, $count, $mask) = ($_[0] ^ $_[1], $_[2] || 8, 0); $mask = 1 << $_, $diff & $mask && ++$count for 0 .. $bits - 1; $count; } sub hd_OP { my ($d1,$d2,$bits) = @_; $bits ||= 8; my $diff = $d1 ^ $d2; my $count = 0; for my $b ( 0 .. $bits-1 ) { my $mask = 1 << $b; ++ $count if $diff & $mask } return $count }

    Obviously, you'll want to run that on your system. I usually run benchmarks several times to identify outliers; here's a fairly representative result:

    $ pm_hamming.pl Test results: 4 4 5 5 6 6 7 7 Rate hd_OP_8 hd_Ken_8 hd_OP_8 344926/s -- -24% hd_Ken_8 454209/s 32% -- Rate hd_OP_16 hd_Ken_16 hd_OP_16 231849/s -- -26% hd_Ken_16 314139/s 35% -- Rate hd_OP_32 hd_Ken_32 hd_OP_32 139183/s -- -30% hd_Ken_32 198422/s 43% -- Rate hd_OP_64 hd_Ken_64 hd_OP_64 77282/s -- -32% hd_Ken_64 113551/s 47% --

    -- Ken

Re: Calculating Hamming distance (binary)
by jwkrahn (Monsignor) on Aug 11, 2013 at 07:01 UTC
    #Hamming distance sub hd { my ($d1,$d2,$bits) = @_; $bits ||= 8; my $diff = $d1 ^ $d2; my $count = 0; for my $b ( 0 .. $bits-1 ) { my $mask = 1 << $b; ++ $count if $diff & $mask } return $count }

    When I run your subroutine I get the warning message:

    Argument "^C" isn't numeric in bitwise and (&) at -e line XX.

    That message means that $diff is a string and $mask is a number.    You can use the bit-wise operators like & on two strings or two numbers but not on a number and a string.

    perl provides a way to count bits in a string using the unpack function.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1048965]
Approved by davido
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (17)
As of 2015-07-30 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls