Hi PerlMonks,
I have a string i.e. $string="ATATGCGCAT" 10-letter long comprising of
four letters A,T,G,C. I am interested in getting all possible
combinations of 10-letter without changing their positions in the
string and considering 2 (or varying) levels for each of A,T,G & C. Moreover, I have used
a sliding window of size 4 in the script try.pl. I want to keep the provision
of the window size in the script. This is because when the string length is more than 40
with varying levels of basic letters, then the number of possible combinations
becomes very large and cmd does not give the results. Using window size at first I want to
divide the string into fragments. Each smaller fragment will be used to produce a set
of combinations. Then the first combination of the first fragment will be concatenated
with the first combination of the second fragment to produce a new combination, which
will then be concatenated with the first combination of third fragment till the entire length
of the original string. Similarly, other combinations will be produced.
I have written a script try.pl which produces all combinations of varying sizes (ranging from
1 to 8 letters only). I need only the combinations of actual length of the original string
(i.e. 10 in this case) in the output file & each combination starting with a symbol "~" and
ending with "~". I am at my wit's end to solve this problem.
Here goes the script try.pl:
#!/usr/bin/perl
use warnings;
$string="ATATGCGCAT";
###########################################
# Output to a TEXT File:
###########################################
$output="Results .txt";
open (my $fh,">",$output) or die"Cannot open file
'$output'.\n";
#####################################
# To break into 4-letter fragments:
#####################################
while ($string=~ /(.{4}?)/ig)
{$four=$&;
@sw=$four=~ /[ATGC]{1}/igs;
foreach my $single (@sw) {
####################################################
# To extract single letter & append perd to single:
####################################################
$perd="%d";
$mod_four=$single.$perd; # concatenation
push @new_four,$mod_four;
$new_four = join ('',@new_four);
# To produce all possible combinations without changing positions:
for $a (1 .. 2) { # a has 2 levels:
for $t (1 .. 2) { # t has 2 levels:
for $g (1 .. 2) { # g has 2 levels:
for $c (1 .. 2) { # c has 2 levels:
$combi=sprintf($new_four,$a,$t,$g,$c,3-$a,3-$t,3-$g,3-$c);
print"~$combi\n";
print $fh "~$combi\n";
}
}
}
}
} # 2nd foreach closes:
} # 1st while closes:
print"~";
print"\n";
print $fh "~";
print $fh "\n";
close $output;
exit;
I have got the following results in the output text file Results .txt. This is not what I want:
~A1
~A1
~A1
~A1
~A1
~A1
~A1
~A1
~A2
~A2
~A2
~A2
~A2
~A2
~A2
~A2
~A1T1
~A1T1
~A1T1
~A1T1
~A1T2
~A1T2
~A1T2
~A1T2
~A2T1
~A2T1
~A2T1
~A2T1
~A2T2
~A2T2
~A2T2
~A2T2
~A1T1A1
~A1T1A1
~A1T1A2
~A1T1A2
~A1T2A1
~A1T2A1
~A1T2A2
~A1T2A2
~A2T1A1
~A2T1A1
~A2T1A2
~A2T1A2
~A2T2A1
~A2T2A1
~A2T2A2
~A2T2A2
~A1T1A1T1
~A1T1A1T2
~A1T1A2T1
~A1T1A2T2
~A1T2A1T1
~A1T2A1T2
~A1T2A2T1
~A1T2A2T2
~A2T1A1T1
~A2T1A1T2
~A2T1A2T1
~A2T1A2T2
~A2T2A1T1
~A2T2A1T2
~A2T2A2T1
~A2T2A2T2
~A1T1A1T1G2
~A1T1A1T2G2
~A1T1A2T1G2
~A1T1A2T2G2
~A1T2A1T1G2
~A1T2A1T2G2
~A1T2A2T1G2
~A1T2A2T2G2
~A2T1A1T1G1
~A2T1A1T2G1
~A2T1A2T1G1
~A2T1A2T2G1
~A2T2A1T1G1
~A2T2A1T2G1
~A2T2A2T1G1
~A2T2A2T2G1
~A1T1A1T1G2C2
~A1T1A1T2G2C2
~A1T1A2T1G2C2
~A1T1A2T2G2C2
~A1T2A1T1G2C1
~A1T2A1T2G2C1
~A1T2A2T1G2C1
~A1T2A2T2G2C1
~A2T1A1T1G1C2
~A2T1A1T2G1C2
~A2T1A2T1G1C2
~A2T1A2T2G1C2
~A2T2A1T1G1C1
~A2T2A1T2G1C1
~A2T2A2T1G1C1
~A2T2A2T2G1C1
~A1T1A1T1G2C2G2
~A1T1A1T2G2C2G2
~A1T1A2T1G2C2G1
~A1T1A2T2G2C2G1
~A1T2A1T1G2C1G2
~A1T2A1T2G2C1G2
~A1T2A2T1G2C1G1
~A1T2A2T2G2C1G1
~A2T1A1T1G1C2G2
~A2T1A1T2G1C2G2
~A2T1A2T1G1C2G1
~A2T1A2T2G1C2G1
~A2T2A1T1G1C1G2
~A2T2A1T2G1C1G2
~A2T2A2T1G1C1G1
~A2T2A2T2G1C1G1
~A1T1A1T1G2C2G2C2
~A1T1A1T2G2C2G2C1
~A1T1A2T1G2C2G1C2
~A1T1A2T2G2C2G1C1
~A1T2A1T1G2C1G2C2
~A1T2A1T2G2C1G2C1
~A1T2A2T1G2C1G1C2
~A1T2A2T2G2C1G1C1
~A2T1A1T1G1C2G2C2
~A2T1A1T2G1C2G2C1
~A2T1A2T1G1C2G1C2
~A2T1A2T2G1C2G1C1
~A2T2A1T1G1C1G2C2
~A2T2A1T2G1C1G2C1
~A2T2A2T1G1C1G1C2
~A2T2A2T2G1C1G1C1
~
Correct results in output file Results .txt should look like:
~A1T1A1T1G2C2G2C2A?T?
~A1T1A1T2G2C2G2C1A?T?
.....................
.....................
.....................
~
For 9th & 10th place of desired results I have used ? sign
to indicate unknown number.