Ok, so if i understood you correctly you have two problems here:
1. You need to know which codons are homologous and which are not. Is that correct ??
2. After you figure out the first you wish to know where are the changes within the proper reading frame ???
Are my assumptions correct ?? Or the sequences are already aligned and you just wish to count the differences ?
In both cases, when alignment is done to count the diff you just iterate through both arrays count triplets, hash them and the for every triplet make a subhash that will record the type and the count of a specific change.
Example:
use strict;
use Data::Dumper;
my $r = 'AAATGTGATGTGAACGT';
my $t = 'AATGTGTCGT-TG-ATG';
my @a = split('',$r);
my @v = split('',$t);
my %hash =();
my $tt = @a>@v ? @a : @v;
for(my $i = 0 ; $i<$tt;$i++){
my $z = 1+$i %3; # Update - suggested by Perlbotics, better !
unless ($a[$i] eq $v[$i]){
$hash{$z}->{"$a[$i]2$v[$i]"}++;
}
}
print Dumper(\%hash);
Result:
$VAR1 = {
'1' => {
'G2T' => 3,
'T2G' => 1,
'A2G' => 1
},
'3' => {
'G2T' => 1,
'C2A' => 1,
'T2G' => 2,
'A2T' => 1
},
'2' => {
'G2T' => 1,
'T2G' => 1,
'A2-' => 1,
'A2C' => 1,
'T2-' => 1
}
};
|