Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: How can I count the number of substitutions of one letter in a string by another letter in the other string?

by Eliya (Vicar)
on May 11, 2012 at 21:10 UTC ( #970102=note: print w/ replies, xml ) Need Help??


in reply to How can I count the number of substitutions of one letter in a string by another letter in the other string?

In case performance matters and your sequences are longer, you could also use XOR bit logic in combination with tr///:

#!/usr/bin/perl -w use strict; my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C); my %cnt; for my $change (@cnt) { my ($c1, $c2) = split /2/, $change; $c2 =~ tr/ATCG/HRDZ/; my $x = $c1 ^ $c2; $cnt{$change} = eval "sub { \$_[0] =~ tr/$x/$x/ }"; } my $seq1 = "AAAGATCGTG"; my $seq2 = "AGTACTTTCC"; (my $tmp = $seq2) =~ tr/ATCG/HRDZ/; my $diff = $seq1 ^ $tmp; print join("; ", map { $_."=".$cnt{$_}->($diff) } @cnt ), "."; __END__ A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.

The idea here is that XORing two strings produces a different value X for each combination of characters, which you can then count with a simple tr/X/X/.  The additional prior transformation tr/ATCG/HRDZ/ on one of the strings is required because the XOR operation would otherwise produce symmetric/undirected results, i.e. A2G would be indistinguishable from G2A, etc.

(Just in case it isn't clear:  the %cnt hash doesn't hold the count results, but rather the subs doing the respective tr/X/X/ counting.)


Comment on Re: How can I count the number of substitutions of one letter in a string by another letter in the other string?
Select or Download Code
Re^2: How can I count the number of substitutions of one letter in a string by another letter in the other string?
by BrowserUk (Pope) on May 11, 2012 at 22:19 UTC

    Whilst orders of magnitude faster, as posted, your solution is missing some possibilities.

    Given the input of:

    $seq1 = 'AAAACCCCGGGGTTTT'; $seq2 = 'ACGTACGTACGTACGT';

    It should produce 12 counts:

    { A2C => 1, A2G => 1, A2T => 1, C2A => 1, C2G => 1, C2T => 1, G2A => 1, G2C => 1, G2T => 1, T2A => 1, T2C => 1, T2G => 1, }

    But only produces the following eight:

    A2G=1; G2A=1; C2T=1; T2C=1; A2T=1; A2C=1; G2T=1; G2C=1.

    I'm sure that this is eminently fixable.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      your solution is missing some possibilities.

      Well, I implemented (on purpose) exactly the list the OP asked for.  It can of course be extended, if so desired:

      my @cnt = qw(A2G G2A C2T T2C A2T A2C G2T G2C ...);

      Alternatively, simply compute the full matrix of count routines:

      #!/usr/bin/perl -w use strict; my %cnt; my @c = qw(A T C G); for my $c1 (@c) { for my $c2 (@c) { next if $c1 eq $c2; (my $t = $c2) =~ tr/ATCG/HRDZ/; my $x = $c1 ^ $t; $cnt{$c1."2".$c2} = eval "sub { \$_[0] =~ tr/$x/$x/ }"; } } my $seq1 = 'AAAACCCCGGGGTTTT'; my $seq2 = 'ACGTACGTACGTACGT'; (my $tmp = $seq2) =~ tr/ATCG/HRDZ/; my $diff = $seq1 ^ $tmp; print join("; ", map { $_."=".$cnt{$_}->($diff) } sort keys %cnt ), ". +"; __END__ A2C=1; A2G=1; A2T=1; C2A=1; C2G=1; C2T=1; G2A=1; G2C=1; G2T=1; T2A=1; +T2C=1; T2G=1.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://970102]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2014-07-28 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (193 votes), past polls