Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: Hamming Distance Between 2 Strings - Fast(est) Way?

by alpapan (Initiate)
on Apr 22, 2011 at 23:47 UTC ( #900877=note: print w/replies, xml ) Need Help??


in reply to Re: Hamming Distance Between 2 Strings - Fast(est) Way?
in thread Hamming Distance Between 2 Strings - Fast(est) Way?

Hello I'm confused with the benchmark by Edward. The XOR code seems to be 3 times faster than the 'mine' using the following benchmark (shouldn't benchmark code always be published?)
#!/usr/bin/perl -w use strict; my $s1 = 'AAAAA'; my $s2 = 'ATCAA'; for (my $i=0;$i<600000;$i++){ #choose one of the two methods #hd($s1,$s2); # real 0m1.401s hd2($s1,$s2); # real 0m0.405s } sub hd{ my ($k,$l) = @_; my $len = length ($k); my $num_mismatch = 0; for (my $i=0; $i<$len; $i++) { ++$num_mismatch if substr($k, $i, 1) ne substr($l, $i, 1); } return $num_mismatch; } sub hd2 { return ($_[0] ^ $_[1]) =~ tr/\001-\255//; }

Replies are listed 'Best First'.
Re^3: Hamming Distance Between 2 Strings - Fast(est) Way?
by alpapan (Initiate) on Apr 22, 2011 at 23:50 UTC
    Duh! I misread Edwards' benchmark. missed that the benchmark was /s (instead of s) - shouldn't I be reading more carefully? :-)
      Here is something along the lines of what monkfan had probably written
      #!/usr/bin/perl -- use strict; use warnings; use Benchmark qw(cmpthese); our $data; print "\n$]\n"; { my $s1 = 'AAAAA'; my $s2 = 'ATCAA'; print join ' ', RoyJohnson500332($s1,$s2)," "; print join ' ', monkfan500235($s1,$s2)," "; print join ' ', BrowserUk500244($s1,$s2)," "; print join ' ', inman500994($s1,$s2)," "; print "\n"; } for my $range ( 5, 1_000 , 2_000 , 10_000 , 100_000 ){ $data = join '',map { ( qw' T A C G ' )[ $_ % 4 ] } 0 .. $range; my $s1 = $data.'AAAAA'; my $s2 = $data.'ATCAA'; print "## Length $range ", "##" x 11, "\n"; cmpthese (-3, { Mine => sub { monkfan500235($s1,$s2); return }, BUk => sub { BrowserUk500244($s1,$s2); return }, inman => sub { inman500994($s1,$s2); return }, RJ => sub { RoyJohnson500332($s1,$s2); return }, }); print "\n"; } sub monkfan500235 { my ($k,$l) = @_; my $len = length ($k); my $num_mismatch = 0; for (my $i=0; $i<$len; $i++) { ++$num_mismatch if substr($k, $i, 1) ne substr($l, $i, 1); } return $num_mismatch; } sub BrowserUk500244 { length( $_[ 0 ] ) - ( ( $_[ 0 ] ^ $_[ 1 ] ) =~ t +r[\0][\0] ) } sub RoyJohnson500332 { my ($k, $l) = @_; my $diff = $k ^ $l; my $num_mismatch = $diff =~ tr/\0//c; } sub inman500994 { return ($_[0] ^ $_[1]) =~ tr/\001-\255//; } __END__ 5.012002 2 2 2 2 ## Length 5 ###################### Rate Mine RJ BUk inman Mine 127809/s -- -78% -85% -87% RJ 580454/s 354% -- -33% -43% BUk 872298/s 582% 50% -- -14% inman 1012310/s 692% 74% 16% -- ## Length 1000 ###################### Rate Mine RJ BUk inman Mine 1862/s -- -99% -99% -99% RJ 176683/s 9389% -- -18% -21% BUk 216727/s 11540% 23% -- -4% inman 224899/s 11979% 27% 4% -- ## Length 2000 ###################### Rate Mine RJ BUk inman Mine 934/s -- -99% -99% -99% RJ 100661/s 10677% -- -17% -19% BUk 120656/s 12818% 20% -- -2% inman 123544/s 13127% 23% 2% -- ## Length 10000 ###################### Rate Mine RJ inman BUk Mine 187/s -- -99% -99% -99% RJ 22959/s 12181% -- -16% -18% inman 27435/s 14575% 19% -- -2% BUk 27976/s 14865% 22% 2% -- ## Length 100000 ###################### Rate Mine RJ inman BUk Mine 18.7/s -- -99% -99% -99% RJ 2085/s 11031% -- -21% -22% inman 2651/s 14054% 27% -- -1% BUk 2667/s 14136% 28% 1% --
Re^3: Hamming Distance Between 2 Strings - Fast(est) Way?
by ZWcarp (Beadle) on Jul 08, 2011 at 16:38 UTC
    Sorry this is probably a really dumb question...but.. what is the ^ character doing in this subfunction? I know in regex it means match the first... and its used as a way to negate character class when between brackets...but I cant find its use like shown above.

      ^ in this case is bitwise-OR.

      When you bitwise-OR two strings, where the bytes in the two strings are the same, the byte in the resultant string will be chr(0); when they are different, the results will be non-chr(0). This allows you to count the number of differences in two strings very quickly:

      $seq1 = 'ACGTACGTACGTACGT';; $seq2 = 'TCGATCGATCGATCGA';; print ( $seq1 ^ $seq2 );; print length( $seq1) - ( $seq1 ^ $seq2 ) =~ tr[\0][\0];; 8

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://900877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2020-11-30 20:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?