Your skill will accomplishwhat the force of many cannot PerlMonks

### Re: How can I count the number of substitutions of one letter in a string by another letter in the other string?

by Eliya (Vicar)
 on May 11, 2012 at 21:10 UTC ( #970102=note: print w/replies, xml ) Need Help??

In case performance matters and your sequences are longer, you could also use XOR bit logic in combination with tr///:

```#!/usr/bin/perl -w
use strict;

my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C);
my %cnt;

for my \$change (@cnt) {
my (\$c1, \$c2) = split /2/, \$change;
\$c2 =~ tr/ATCG/HRDZ/;
my \$x = \$c1 ^ \$c2;
\$cnt{\$change} = eval "sub { \\$_[0] =~ tr/\$x/\$x/ }";
}

my \$seq1 = "AAAGATCGTG";
my \$seq2 = "AGTACTTTCC";

(my \$tmp = \$seq2) =~ tr/ATCG/HRDZ/;
my \$diff = \$seq1 ^ \$tmp;

print join("; ", map { \$_."=".\$cnt{\$_}->(\$diff) } @cnt ), ".";

__END__
A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.

The idea here is that XORing two strings produces a different value X for each combination of characters, which you can then count with a simple tr/X/X/.  The additional prior transformation tr/ATCG/HRDZ/ on one of the strings is required because the XOR operation would otherwise produce symmetric/undirected results, i.e. A2G would be indistinguishable from G2A, etc.

(Just in case it isn't clear:  the %cnt hash doesn't hold the count results, but rather the subs doing the respective tr/X/X/ counting.)

• Comment on Re: How can I count the number of substitutions of one letter in a string by another letter in the other string?

Replies are listed 'Best First'.
Re^2: How can I count the number of substitutions of one letter in a string by another letter in the other string?
by BrowserUk (Pope) on May 11, 2012 at 22:19 UTC

Whilst orders of magnitude faster, as posted, your solution is missing some possibilities.

Given the input of:

```\$seq1 = 'AAAACCCCGGGGTTTT';
\$seq2 = 'ACGTACGTACGTACGT';

It should produce 12 counts:

```{
A2C => 1,  A2G => 1,  A2T => 1,  C2A => 1,  C2G => 1,  C2T => 1,
G2A => 1,  G2C => 1,  G2T => 1,  T2A => 1,  T2C => 1,  T2G => 1,
}

But only produces the following eight:

```A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.

I'm sure that this is eminently fixable.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

your solution is missing some possibilities.

Well, I implemented (on purpose) exactly the list the OP asked for.  It can of course be extended, if so desired:

```my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C ...);

Alternatively, simply compute the full matrix of count routines:

```#!/usr/bin/perl -w
use strict;

my %cnt;
my @c = qw(A T C G);
for my \$c1 (@c) {
for my \$c2 (@c) {
next if \$c1 eq \$c2;
(my \$t = \$c2) =~ tr/ATCG/HRDZ/;
my \$x = \$c1 ^ \$t;
\$cnt{\$c1."2".\$c2} = eval "sub { \\$_[0] =~ tr/\$x/\$x/ }";
}
}

my \$seq1 = 'AAAACCCCGGGGTTTT';
my \$seq2 = 'ACGTACGTACGTACGT';

(my \$tmp = \$seq2) =~ tr/ATCG/HRDZ/;
my \$diff = \$seq1 ^ \$tmp;

print join("; ", map { \$_."=".\$cnt{\$_}->(\$diff) } sort keys %cnt ), ".
+";

__END__
A2C=1; A2G=1; A2T=1; C2A=1; C2G=1; C2T=1; G2A=1; G2C=1; G2T=1; T2A=1;
+T2C=1; T2G=1.

Create A New User
Node Status?
node history
Node Type: note [id://970102]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2017-08-19 13:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (311 votes). Check out past polls.

Notices?