Is it possible to find the number of matching and non-matching positions in strings using perl code?

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

I am a beginner in perl programming. I have three sequences like $a=AAATGCCTT, $b=AAAAGCGTC and $c=AAAGGCGTC, which differ at positions 4, 7 and 9 but the rest are alike. Is it possible to use perl code to find the total number of positions where they differ and where they are alike? In this case, the answer will be 3 (dissimilar) and 6 (similar), respectively. Can any perlmonk suggest me which perl code will compare these sequences for matching?

#!usr/bin/perl-w
use strict;
my $a=AAATGCCTT;
my $b=AAAAGCGTC;
my $c=AAAGGCGTC;

$match=???
$nonmatch=???
perl code???

print"\n   No. of matched positions=$match.\n
       No. of non-matched positions=$nonmatch.\n";
exit;
[download]

The answer should look like:

No. of matched positions=3.
No. of non-matched positions=6.
[download]

Comment on Is it possible to find the number of matching and non-matching positions in strings using perl code? Select or Download Code

Replies are listed 'Best First'.
Re: Is it possible to find the number of matching and non-matching positions in strings using perl code? by sauoq (Abbot) on May 10, 2012 at 17:04 UTC
Is it possible to use perl code to find the total number of positions where they differ and where they are alike? Of course. Here is one straight forward approach: `#!/usr/bin/perl my ($a, $b, $c) = qw (AAATGCCTT AAAAGCGTC AAAGGCGTC); my ($similar, $dissimilar); for (0 .. length($a)-1) { if (substr($a,$_,1) eq substr($b, $_, 1) and substr($b, $_, 1) eq +substr($c, $_, 1)) { print "MATCH"; $similar ++; } else { print "NO MATCH"; $dissimilar ++; } print " at position $_\n"; } print "There were " . $similar . " similar and " . $dissimilar . " dis +similar.\n";` [download] `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l]
Re: Is it possible to find the number of matching and non-matching positions in strings using perl code? by moritz (Cardinal) on May 11, 2012 at 06:46 UTC
Finding characters where two string differs can be done with bitwise operations. If you binary XOR two strings, positions where both characters are the same come out as a null byte. When doing several comparisons, one can accumulate the differing positions using binary OR: `use warnings; use strict; use 5.010; # for say() my $a='AAATGCCTT'; my $b='AAAAGCGTC'; my $c='AAAGGCGTC'; my $mask = chr(0) x length $a; for ($b, $c) { $mask \|= $a ^ $_; } # just to illustrate what the mask looks like: use Data::Dumper; $Data::Dumper::Useqq = 1; # count number of 0-bytes my $matches =()= $mask =~ /\0/g; say "Matches: ", $matches; say "Non-matches: ", length($a) - $matches;` [download] This approach should scale well for longer strings, since the binary operations are faster than looping over all characters. Perl 6 - the future is here, just unevenly distributed	[reply] [d/l]
Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code? by BrowserUk (Patriarch) on May 11, 2012 at 07:29 UTC
Nice explanation++. One minor change: s/binary operations are faster than looping over all characters/looping over the characters in C is faster than looping over the characters in Perl/ With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re^3: Is it possible to find the number of matching and non-matching positions in strings using perl code? by Jenda (Abbot) on May 11, 2012 at 08:38 UTC
The question is whether or how much C-level looping is there actually. Some CISC processors include string operators so the XOR might actually be a single instruction. Jenda Enoch was right! Enjoy the last years of Rome.	[reply]
Re^4: Is it possible to find the number of matching and non-matching positions in strings using perl code? by BrowserUk (Patriarch) on May 11, 2012 at 09:12 UTC
Re^5: Is it possible to find the number of matching and non-matching positions in strings using perl code? by Jenda (Abbot) on May 11, 2012 at 17:13 UTC
Some notes below your chosen depth have not been shown here
Re^4: Is it possible to find the number of matching and non-matching positions in strings using perl code? by moritz (Cardinal) on May 11, 2012 at 09:15 UTC
Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code? by sauoq (Abbot) on May 11, 2012 at 11:26 UTC
Thanks for explaining what I was doing here: Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code? But the for loop just obfuscates things without adding anything at all. Also, calling your variable `$mask` is questionable as you don't really intend to use it as a mask. `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l]
Re^3: Is it possible to find the number of matching and non-matching positions in strings using perl code? by moritz (Cardinal) on May 11, 2012 at 12:25 UTC
But the for loop just obfuscates things without adding anything at all. It adds generality beyond three strings to compare. Also, calling your variable $mask is questionable as you don't really intend to use it as a mask. So what do you suggest instead? Your usage of `$bits` isn't any better, because you don't care about bits, but bytes. But `$bytes` also wouldn't explain the purpose of the variable. Perl 6 - the future is here, just unevenly distributed	[reply] [d/l] [select]
Re^4: Is it possible to find the number of matching and non-matching positions in strings using perl code? by sauoq (Abbot) on May 11, 2012 at 12:53 UTC
Re: Is it possible to find the number of matching and non-matching positions in strings using perl code? by BillKSmith (Monsignor) on May 10, 2012 at 22:49 UTC
I would split the strings into character arrays and then count the matches returned by the each_array function of the module List::MoreUtils. Nomatches is the length of the string minus number of matches. If efficiency is an issue, the substr method already posted is probably better. Profile to be sure.	[reply]
Re^2: Is it possible to find the number of matching and non-matching positions in strings using perl code? by sauoq (Abbot) on May 11, 2012 at 01:16 UTC
I would split the strings into character arrays and then count the matches returned by the each_array function of the module List::MoreUtils. Eeek! No... don't do that. And, for that matter, don't use the approach I gave above either. I just wanted to show how, yes, it could easily be done just by automating the way you might do it by hand. If you want efficiency, resort to bit twiddling! Like this: `#!/usr/bin/perl my ($a, $b, $c) = qw (AAATGCCTT AAAAGCGTC AAAGGCGTC); my $bits = ($a ^ $b) \| ($b ^ $c); my $c = $bits =~ tr/\0/\0/; print "Similar: $c\n";` [download] :-) `-sauoq "My two cents aren't worth a dime.";`	[reply] [d/l]

Back to Seekers of Perl Wisdom