How can I count the number of substitutions of one letter in a string by another letter in the other string?

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

I am beginner in perl programming. Suppose, I have two strings i.e. $seq1="AAAGATCGTG"; and $seq2="AGTACTTTCC"; I am interested in counting the number of substitutions of A in $seq1 by G in $seq2 i.e. $A2G and similarly, $G2A, $C2T, $T2C, $A2T, $A2C, $G2T and $G2C. Some sites like 0 and 5 are same in $seq1 and $seq2. Which perl code should I use to find out the number of such substitutions? I don't know the code that I should write in the places marked ?????? in the following program to find the correct results. Can any perlmonk help me to count the substitutions?

#!usr/bin/perl-w
use strict;
my $seq1="AAAGATCGTG";
my $seq2="AGTACTTTCC";
my ($A2G,$G2A,$C2T,$T2C,$A2T,$A2C,$G2T,$G2C)
for (0..length($seq1)-1){
   if (substr($seq1, ???) eq substr($seq2, ??))
   {$A2G++;$G2A++;$C2T++;$T2C++;$A2T++;$A2C++;$G2T++;$G2C++;} 
   else { ?????}
   }
print"\n No. of Substitutions:\n
  A2G=$A2G; G2A=$G2A; C2T=$C2T; T2C=$T2C; A2T=$A2T; A2C=$A2C; G2T=$G2T
+; G2C=$G2C.\n";
exit;
[download]

The results should look like:

A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.

Comment on How can I count the number of substitutions of one letter in a string by another letter in the other string? Select or Download Code

Replies are listed 'Best First'.
Re: How can I count the number of substitutions of one letter in a string by another letter in the other string? by BrowserUk (Patriarch) on May 11, 2012 at 18:58 UTC
Using individual named vars makes this awkward. Using a hash, makes it simple: `#! perl =slw use strict; use Data::Dump qw[ pp ]; my $seq1="AAAGATCGTG"; my $seq2="AGTACTTTCC"; my %subs; for my $p ( 0 .. length( $seq1 ) - 1 ) { my $c1 = substr $seq1, $p, 1; my $c2 = substr $seq2, $p, 1; next if $c1 eq $c2; ++$subs{ $c1 . '2' . $c2 }; } pp \%subs; __END__ C:\test>junk.pl { A2C => 1, A2G => 1, A2T => 1, C2T => 1, G2A => 1, G2C => 1, G2T => 1 +, T2C => 1 }` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^2: How can I count the number of substitutions of one letter in a string by another letter in the other string? by spazm (Monk) on May 11, 2012 at 20:40 UTC
ooh, very pretty. Thanks, I was hoping to see something nice today.	[reply]
Re: How can I count the number of substitutions of one letter in a string by another letter in the other string? by Eliya (Vicar) on May 11, 2012 at 21:10 UTC
In case performance matters and your sequences are longer, you could also use XOR bit logic in combination with tr///: `#!/usr/bin/perl -w use strict; my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C); my %cnt; for my $change (@cnt) { my ($c1, $c2) = split /2/, $change; $c2 =~ tr/ATCG/HRDZ/; my $x = $c1 ^ $c2; $cnt{$change} = eval "sub { \$_[0] =~ tr/$x/$x/ }"; } my $seq1 = "AAAGATCGTG"; my $seq2 = "AGTACTTTCC"; (my $tmp = $seq2) =~ tr/ATCG/HRDZ/; my $diff = $seq1 ^ $tmp; print join("; ", map { $_."=".$cnt{$_}->($diff) } @cnt ), "."; __END__ A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.` [download] The idea here is that XORing two strings produces a different value `X` for each combination of characters, which you can then count with a simple `tr/X/X/`. The additional prior transformation `tr/ATCG/HRDZ/` on one of the strings is required because the XOR operation would otherwise produce symmetric/undirected results, i.e. A2G would be indistinguishable from G2A, etc. (Just in case it isn't clear: the `%cnt` hash doesn't hold the count results, but rather the subs doing the respective `tr/X/X/` counting.)	[reply] [d/l] [select]
Re^2: How can I count the number of substitutions of one letter in a string by another letter in the other string? by BrowserUk (Patriarch) on May 11, 2012 at 22:19 UTC
Whilst orders of magnitude faster, as posted, your solution is missing some possibilities. Given the input of: `$seq1 = 'AAAACCCCGGGGTTTT'; $seq2 = 'ACGTACGTACGTACGT';` [download] It should produce 12 counts: `{ A2C => 1, A2G => 1, A2T => 1, C2A => 1, C2G => 1, C2T => 1, G2A => 1, G2C => 1, G2T => 1, T2A => 1, T2C => 1, T2G => 1, }` [download] But only produces the following eight: `A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.` [download] I'm sure that this is eminently fixable. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^3: How can I count the number of substitutions of one letter in a string by another letter in the other string? by Eliya (Vicar) on May 11, 2012 at 23:19 UTC
your solution is missing some possibilities. Well, I implemented (on purpose) exactly the list the OP asked for. It can of course be extended, if so desired: `my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C ...);` [download] Alternatively, simply compute the full matrix of count routines: #!/usr/bin/perl -w use strict; my %cnt; my @c = qw(A T C G); for my $c1 (@c) { for my $c2 (@c) { next if $c1 eq $c2; (my $t = $c2) =~ tr/ATCG/HRDZ/; my $x = $c1 ^ $t; $cnt{$c1."2".$c2} = eval "sub { \$_[0] =~ tr/$x/$x/ }"; } } my $seq1 = 'AAAACCCCGGGGTTTT'; my $seq2 = 'ACGTACGTACGTACGT'; (my $tmp = $seq2) =~ tr/ATCG/HRDZ/; my $diff = $seq1 ^ $tmp; print join("; ", map { $_."=".$cnt{$_}->($diff) } sort keys %cnt ), ". +"; __END__ A2C=1; A2G=1; A2T=1; C2A=1; C2G=1; C2T=1; G2A=1; G2C=1; G2T=1; T2A=1; +T2C=1; T2G=1. [download]	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom