Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

How can one get all possible combinations of a string without changing positions & using window size?

by supriyoch_2008 (Monk)
on Apr 20, 2013 at 19:28 UTC ( [id://1029679]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I have a string i.e. $string="ATATGCGCAT" 10-letter long comprising of four letters A,T,G,C. I am interested in getting all possible combinations of 10-letter without changing their positions in the string and considering 2 (or varying) levels for each of A,T,G & C. Moreover, I have used a sliding window of size 4 in the script try.pl. I want to keep the provision of the window size in the script. This is because when the string length is more than 40 with varying levels of basic letters, then the number of possible combinations becomes very large and cmd does not give the results. Using window size at first I want to divide the string into fragments. Each smaller fragment will be used to produce a set of combinations. Then the first combination of the first fragment will be concatenated with the first combination of the second fragment to produce a new combination, which will then be concatenated with the first combination of third fragment till the entire length of the original string. Similarly, other combinations will be produced.

I have written a script try.pl which produces all combinations of varying sizes (ranging from 1 to 8 letters only). I need only the combinations of actual length of the original string (i.e. 10 in this case) in the output file & each combination starting with a symbol "~" and ending with "~". I am at my wit's end to solve this problem.

Here goes the script try.pl:

#!/usr/bin/perl use warnings; $string="ATATGCGCAT"; ########################################### # Output to a TEXT File: ########################################### $output="Results .txt"; open (my $fh,">",$output) or die"Cannot open file '$output'.\n"; ##################################### # To break into 4-letter fragments: ##################################### while ($string=~ /(.{4}?)/ig) {$four=$&; @sw=$four=~ /[ATGC]{1}/igs; foreach my $single (@sw) { #################################################### # To extract single letter & append perd to single: #################################################### $perd="%d"; $mod_four=$single.$perd; # concatenation push @new_four,$mod_four; $new_four = join ('',@new_four); # To produce all possible combinations without changing positions: for $a (1 .. 2) { # a has 2 levels: for $t (1 .. 2) { # t has 2 levels: for $g (1 .. 2) { # g has 2 levels: for $c (1 .. 2) { # c has 2 levels: $combi=sprintf($new_four,$a,$t,$g,$c,3-$a,3-$t,3-$g,3-$c); print"~$combi\n"; print $fh "~$combi\n"; } } } } } # 2nd foreach closes: } # 1st while closes: print"~"; print"\n"; print $fh "~"; print $fh "\n"; close $output; exit;

I have got the following results in the output text file Results .txt. This is not what I want:

~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A1 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A2 ~A1T1 ~A1T1 ~A1T1 ~A1T1 ~A1T2 ~A1T2 ~A1T2 ~A1T2 ~A2T1 ~A2T1 ~A2T1 ~A2T1 ~A2T2 ~A2T2 ~A2T2 ~A2T2 ~A1T1A1 ~A1T1A1 ~A1T1A2 ~A1T1A2 ~A1T2A1 ~A1T2A1 ~A1T2A2 ~A1T2A2 ~A2T1A1 ~A2T1A1 ~A2T1A2 ~A2T1A2 ~A2T2A1 ~A2T2A1 ~A2T2A2 ~A2T2A2 ~A1T1A1T1 ~A1T1A1T2 ~A1T1A2T1 ~A1T1A2T2 ~A1T2A1T1 ~A1T2A1T2 ~A1T2A2T1 ~A1T2A2T2 ~A2T1A1T1 ~A2T1A1T2 ~A2T1A2T1 ~A2T1A2T2 ~A2T2A1T1 ~A2T2A1T2 ~A2T2A2T1 ~A2T2A2T2 ~A1T1A1T1G2 ~A1T1A1T2G2 ~A1T1A2T1G2 ~A1T1A2T2G2 ~A1T2A1T1G2 ~A1T2A1T2G2 ~A1T2A2T1G2 ~A1T2A2T2G2 ~A2T1A1T1G1 ~A2T1A1T2G1 ~A2T1A2T1G1 ~A2T1A2T2G1 ~A2T2A1T1G1 ~A2T2A1T2G1 ~A2T2A2T1G1 ~A2T2A2T2G1 ~A1T1A1T1G2C2 ~A1T1A1T2G2C2 ~A1T1A2T1G2C2 ~A1T1A2T2G2C2 ~A1T2A1T1G2C1 ~A1T2A1T2G2C1 ~A1T2A2T1G2C1 ~A1T2A2T2G2C1 ~A2T1A1T1G1C2 ~A2T1A1T2G1C2 ~A2T1A2T1G1C2 ~A2T1A2T2G1C2 ~A2T2A1T1G1C1 ~A2T2A1T2G1C1 ~A2T2A2T1G1C1 ~A2T2A2T2G1C1 ~A1T1A1T1G2C2G2 ~A1T1A1T2G2C2G2 ~A1T1A2T1G2C2G1 ~A1T1A2T2G2C2G1 ~A1T2A1T1G2C1G2 ~A1T2A1T2G2C1G2 ~A1T2A2T1G2C1G1 ~A1T2A2T2G2C1G1 ~A2T1A1T1G1C2G2 ~A2T1A1T2G1C2G2 ~A2T1A2T1G1C2G1 ~A2T1A2T2G1C2G1 ~A2T2A1T1G1C1G2 ~A2T2A1T2G1C1G2 ~A2T2A2T1G1C1G1 ~A2T2A2T2G1C1G1 ~A1T1A1T1G2C2G2C2 ~A1T1A1T2G2C2G2C1 ~A1T1A2T1G2C2G1C2 ~A1T1A2T2G2C2G1C1 ~A1T2A1T1G2C1G2C2 ~A1T2A1T2G2C1G2C1 ~A1T2A2T1G2C1G1C2 ~A1T2A2T2G2C1G1C1 ~A2T1A1T1G1C2G2C2 ~A2T1A1T2G1C2G2C1 ~A2T1A2T1G1C2G1C2 ~A2T1A2T2G1C2G1C1 ~A2T2A1T1G1C1G2C2 ~A2T2A1T2G1C1G2C1 ~A2T2A2T1G1C1G1C2 ~A2T2A2T2G1C1G1C1 ~

Correct results in output file Results .txt should look like:

~A1T1A1T1G2C2G2C2A?T? ~A1T1A1T2G2C2G2C1A?T? ..................... ..................... ..................... ~

For 9th & 10th place of desired results I have used ? sign to indicate unknown number.

Replies are listed 'Best First'.
Re: How can one get all possible combinations of a string without changing positions & using window size?
by hdb (Monsignor) on Apr 20, 2013 at 20:09 UTC

    I am not sure I understand the question. Can it be reformulated as: get all 2^10 combinations of 1s and 2s and insert them after each letter in the given string? For this question the answer would be:

    use strict; use warnings; my $string = "ATATGCGCAT"; my @letters = split '', $string; for( my $i=0; $i<2**@letters; ++$i ) { my $b = sprintf("%010b",$i); $b=~tr/01/12/; for( my $j=0; $j<@letters; ++$j ) { print $letters[$j],substr $b, $j, 1; } print "\n"; last if $i==20; # remove if you want all... }

      Hi hdb,

      My problem is similar to the following exercise given in the English Grammar book of primary school students. Construct all possible sentences from the given table using all blocks (4) and count total numbers of such sentences.

      Here goes the Table with 4 blocks:

      ---------------------------------------- I | play | | at home. | You | donot play | soccer | at school.| We | | | | ---------------------------------------- Result: No. of possible sentences: 3X2x1x2=12 Sentences are: I play soccer at home. I play soccer at school. I donot play soccer at home. I donot play soccer at school. You play soccer at home. You play soccer at school. You donot play soccer at home. You donot play soccer at school. We play soccer at home. We play soccer at school. We donot play soccer at home. We donot play soccer at school.

      I want all possible combinations of 10-letter from the string "ATATGCGCAT" without changing positions of letters in the actual string, where say A is of 3 levels, T at 2 levels, G at 2 levels and C at 1 level. By 3 levels of A I mean that A has A1,A2 & A3. Likewise, T1,T2 for T; G1,G2 for G and C1 for C. I want to use window size in the script to break a bigger string into smaller fragments & obtain all possible combinations for each fragment & then concatenate them. I hope this will help perl monks to understand my problem in a better way. I am sorry because I could not possibly present the problem nicely in the thread entitled "How can one get all possible combinations of a string without changing positions?" yesterday.

        Perhaps the following will be helpful as a pattern for generation combinations:

        use warnings; use strict; my $i = 1; print $i++ . ". $_\n" for glob "{'I','You','We'}{' play',' do not play'}{' soccer'}{' at home.',' at +school.'}";

        Output:

        1. I play soccer at home. 2. I play soccer at school. 3. I do not play soccer at home. 4. I do not play soccer at school. 5. You play soccer at home. 6. You play soccer at school. 7. You do not play soccer at home. 8. You do not play soccer at school. 9. We play soccer at home. 10. We play soccer at school. 11. We do not play soccer at home. 12. We do not play soccer at school.

        I am not certain about your objectives, but maybe something like the following would do what you want. It doesn't use a window but it does add numbers, as I understand your description.

        use strict; use warnings; my $string = "ATATGCGCAT"; my %levels = ( A => 3, T => 2, G => 2, C => 1, ); my @letters = split(//,$string); my $max = $#letters; my @numbers = map { 1 } @letters; while(1) { # Print the current combination of letters and numbers print "~"; for my $n (0..$max) { print $letters[$n] . $numbers[$n]; } print "~\n"; # Calculate the next set of numbers my $n = 0; $numbers[$n]++; while($numbers[$n] > $levels{$letters[$n]}) { $numbers[$n] = 1; $n++; last if($n > $max); $numbers[$n]++; } last if($n > $max); }

      Hi hdb

      Thanks for your quick reply and the code. I am not sure whether it can be reformulated as 2^10 combinations of 1's and 2's. But your code has partly solved my problem because it has produced some combinations without changing positions of letters in the string. I shall try to understand your code. If I find any difficulty, I shall get back to you.

      With regards,

Re: How can one get all possible combinations of a string without changing positions & using window size?
by LanX (Saint) on Apr 20, 2013 at 20:44 UTC

      Hi LanX,

      Thanks for your suggestion. I shall go through the post mentioned by you.

      With kind regards,

Re: How can one get all possible combinations of a string without changing positions & using window size?
by bioinformatics (Friar) on Apr 21, 2013 at 01:21 UTC
    The reason is this portion of code:
    foreach my $single (@sw) {

    You don't have a closing bracket before:
    for $a (1 .. 2) { # a has 2 levels: for $t (1 .. 2) { # t has 2 levels: for $g (1 .. 2) { # g has 2 levels: for $c (1 .. 2) { # c has 2 levels: $combi=sprintf($new_four,$a,$t,$g,$c,3-$a,3-$t,3-$g,3-$c); print"~$combi\n"; print $fh "~$combi\n";
    It's the way the loops are arranged, that's all. You can see this more easily by using the perl debugger or print statements to see the values of $new_four at each iteration, etc.

    Bioinformatics
Re: How can one get all possible combinations of a string without changing positions & using window size?
by davido (Cardinal) on Apr 21, 2013 at 07:45 UTC

    I don't really understand why a number would be "unknown", and consequently can't address where to place "?" characters. But that aside, this looks to me like a problem of enumerating all possible bit patterns for a ten-bit register. And since ten bits is within the realm of simple Perl integers, you can just iterate over every value from 0 through 2**11-1 and inflate its bit pattern into your original ATATGCGCAT string. This will assure that all possible combinations are enumerated. Here's one way to do that:

    use strict; use warnings; my $string = 'ATATGCGCAT'; for my $num (0 .. 2**11 - 1) { print "$num: ", join('', map { substr($string, $_, 1) . ($num & (2**(9 - $_)) ? '2' : '1' +); } 0 .. 9 ), "\n"; }

    Here we're running through two loops. The outer loop simply iterates over every integer from 0 through 2 ** 11 - 1. That's how we generate our bit patterns. Then another loop maps the bit values into the original string. Finally, each pattern is printed.

    The output will look like this:

    2039: A2T2A2T2G2C2G1C2A2T2 2040: A2T2A2T2G2C2G2C1A1T1 2041: A2T2A2T2G2C2G2C1A1T2 2042: A2T2A2T2G2C2G2C1A2T1 2043: A2T2A2T2G2C2G2C1A2T2 2044: A2T2A2T2G2C2G2C2A1T1 2045: A2T2A2T2G2C2G2C2A1T2 2046: A2T2A2T2G2C2G2C2A2T1 2047: A2T2A2T2G2C2G2C2A2T2

    I hope I understood the problem. ;)


    Dave

Re: How can one get all possible combinations of a string without changing positions & using window size?
by hdb (Monsignor) on Apr 21, 2013 at 07:48 UTC

    I surely have learnt a lot in this thread. I still do not understand what the window size is good for. If you do all combinations within a window and then combine across all windows you get the same as if you were doing all combinations straight away. Or does a specific letter always have the same level within a window?

    Anyways, putting all of the above together (and ignoring the window issue), you get the following code creating 864 combinations:

    use strict; use warnings; my %level = ( A=>3, T=>2, G=>2, C=>1 ); my $string = "ATATGCGCAT"; my $n = 1; $n *= $level{$_} for( split '', $string ); print "Number of combinations: $n\n"; $string =~ s/(.)/$1."{".join(",",1..$level{$1})."}"/ge; print "~$_\n" while(< $string >);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1029679]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-03-19 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found