Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How can one generate all possible combinations with reference to string positions?

by supriyoch_2008 (Scribe)
on Feb 20, 2013 at 17:40 UTC ( #1019812=perlquestion: print w/ replies, xml ) Need Help??
supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,

I have a string $a= “L1 H1 L2 H2”; with two Ls at positions 1, 3 and two Hs at positions 2, 4, respectively. The Ls can interchange between them i.e. say for position 1 , L1 may change to L2 but not to H. Similarly, the two H can interchange between them at positions 2 & 4. Finally these four elements will produce a set of 4 sequences like L1H1L2H2; L2H1L1H2; L1H2L2H1; L2H2L1H1; taking the fixed positions for L (1,3) and H(2,4) into consideration. I am at my wit’s end to write a script which can find all these four combinations. The given string just consists of 4 positions and two groups i.e. L & H each with 2 elements and so it can be worked out easily. If the string contains, say 500 positions with 10 main groups and each group with varying number of elements, it will be much easier to find all the possible combinations using a perl script. I have gone through the posts relating to permutations, combinations and AoA but I couldn’t find a text which can solve this problem. I am looking forward to the suggestions of Perl Monks regarding this problem. The results might look like:

Number of Combinations: 4 Combinations are: L1H1L2H2 L2H1L1H2 L1H2L2H1 L2H2L1H1

Comment on How can one generate all possible combinations with reference to string positions?
Download Code
Re: How can one generate all possible combinations with reference to string positions?
by kennethk (Monsignor) on Feb 20, 2013 at 17:59 UTC
    I observe that your L's and H's don't move, and that the last two terms are dictated by the first two. This means your pattern can be generated fairly simply with
    #!/usr/bin/perl -w use strict; for my $h (1 .. 2) { for my $l (1 .. 2) { printf "L%dH%dL%dH%d\n", $l, $h, 3-$l, 3-$h; } }

    Obviously, scaling this up to 500 positions and 10 groups is non-trivial, but should be doable similarly. If you need arbitrary complexity, you'll probably want to generate a work queue rather than having fixed nested for loops.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Hi kenneth,

      Thanks for the script. It works nicely. I shall be glad if you kindly explain how to use and change the code 3-$l, 3-$h. Does it have any relationship with the number of L and H in the string following printf? Do the $h (1..2) and $l (1..2) refer to 2 levels each for H and L? Actually I tried to learn making some alterations in the script like 4-$l, 5-$h, $h(1..3), $l(1..4) etc.in appropriate places but I could not understand. This is because I donot know why the terms are used and how to change them. I hope you won't mind explain the terms. Please suggest me some reference texts for further study.

Re: How can one generate all possible combinations with reference to string positions?
by BrowserUk (Pope) on Feb 20, 2013 at 18:14 UTC
    If the string contains, say 500 positions with 10 main groups and each group with varying number of elements,

    What form does the information specifying the groups and positions come in? (Ie. What do you start with?)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hi BrowserUk,

      Thanks for your query. In fact I have come across a good research paper on genetics where the authors have changed one synonymous codon with another at a particular position i.e. they want to keep the same amino acid sequence in the polypeptide chain but have generated the Maximal Entropy Distribution from a set of sequences obtained by reshuffling the primary coding sequence. These sequences have been used to find the motifs in coding sequence of genes. To generate those set of sequences, I need to take help of a perl script. Actually there will be 20 major groups (i.e. 20 amino acids) and each group will have synonymous number of codons (except methionine & tryptophan). The authors have indicated that this kind of genetic study might be useful in improving the HIV vaccine. That is why I am interested to generate the set of sequences and learn it for the benefit of the students of our department. This is just out of my curiosity.

      Ref Paper:

      Harlan Robins,Michael Krasnitz and Arnold J Levine. 2008. The Computational Detection of Functional Nucleotide Sequence Motifs in the Coding Regions of Organisms. Experimental Biology & Medicine 233:665-673. (Open Access)

        First. You haven't answered my question: What do you start with in terms of the groups and positions. But forget that for now.

        Actually there will be 20 major groups (i.e. 20 amino acids) and each group will have synonymous number of codons (except methionine & tryptophan).

        That also doesn't tell us how many groups and how many positions for each group you are hoping to deal with, so for sake of somewhere to start I'm going to use your description from the OP.

        If the string contains, say 500 positions with 10 main groups and each group with varying number of elements,

        The way to tackle the problem would be to permute the values in each group independently, then permute each permutation from each group ,against all the permutations from the second group, against all the permutations from the third group; ... and so one. And then substitute the ordered permutations back into the original positions in the composite string.

        But here is the problem

        For the sake of something concrete, let's take that to mean you have 10 groups with 50 positions per group in your 500 position string.

        Each of those 10 groups has 50! permutations: 1*2*3*...*50 = 30414093201714379654605756361199494046490000000000000000000000000 permutations.

        And then you have all the combinations of each of those permutations with each of the (same huge number of) permutations in each of the other groups.

        So your final number of strings is going to be something like 50!**10 =

        6772495493499517441359753377384791035003696881462265580065594970076169 0210201881472400371909805448588932150215198817383799947594646432036237 1232749289013396400770111224550158125609245464425919104497462521652368 8063199046571891351816309749124049098611846130196482679268880617008205 7348631165353328233550287957173141094502004333604168254654323630280443 0686038713119389885652930107338333163364660010000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 000000000000000

        That number is so big that if you could print 1 per nanosecond; and had as many printers as there are atoms in the universe running day & night, you wouldn't finish before the universe ceased to exist due to heat death.

        And that's only 10 amino acids and you have 20; and only 500 characters, and genomes are millions or billions in length.

        Bottom line. You cannot tackle this using brute force.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How can one generate all possible combinations with reference to string positions?
by frozenwithjoy (Curate) on Feb 20, 2013 at 18:54 UTC
    Here's my first pass... so many loops :/ (one major problem with this approach, however, is that it becomes quite memory intensive after you have about 12 elements... so, while this works, you probably want to try to build on kennethk's approach.)
    #!/usr/bin/env perl use strict; use warnings; use feature 'say'; use Math::Combinatorics; use List::MoreUtils qw(indexes); use Data::Printer; my $line = "L1 H1 L2 H2"; my @input = split /\s/, $line; # segregate classes my @h_vals; my @l_vals; for (@input) { push @h_vals, $_ =~ /^H\d+/g; push @l_vals, $_ =~ /^L\d+/g; } # index input based on classes my @h_indexes = indexes { $_ =~ '^H\d+' } @input; my @l_indexes = indexes { $_ =~ '^L\d+' } @input; # permute values within classes my @h_perms = permute @h_vals; my @l_perms = permute @l_vals; # get all permutations of recombined classes my @perms; for my $h_cur (@h_perms) { for my $l_cur (@l_perms) { my @current; my @h = @{$h_cur}; my @l = @{$l_cur}; for (@h_indexes) { $current[$_] = shift @h; } for (@l_indexes) { $current[$_] = shift @l; } push @perms, \@current; } } p @perms; __END__ [ [0] [ [0] "L1", [1] "H2", [2] "L2", [3] "H1" ], [1] [ [0] "L2", [1] "H2", [2] "L1", [3] "H1" ], [2] [ [0] "L1", [1] "H1", [2] "L2", [3] "H2" ], [3] [ [0] "L2", [1] "H1", [2] "L1", [3] "H2" ] ]
Re: How can one generate all possible combinations with reference to string positions?
by kennethk (Monsignor) on Feb 20, 2013 at 22:20 UTC
    So, I decided to take a formal cut at the general problem. Given your problem statement, I thought the most reasonable specification would be a string of 'main groups' in order. Also, rather than reinventing the wheel, I used List::Permutor as recommended in perlfaq4's How do I permute N elements of a list?.
    #!/usr/bin/perl -w use strict; use List::Permutor; my $groups = 'LHLH'; my %count; $count{$_}++ for split //, $groups; my @results = $groups; for my $key (keys %count) { my $perm = new List::Permutor 1 .. $count{$key}; my @indices; while (my @set = $perm->next) { push @indices, \@set; } for my $result (@results) { $result =~ s/(?<=$key)/%d/g; $result = [map sprintf($result, @$_), @indices]; } @results = map @$_, @results; } print join "\n", @results;
    I've used sprintf for index templating. There's a potential problem with memory usage getting a bit large since it holds all variants in memory. The solution to that, I expect, would involve recursive functions. Of course, this kind of problem hits anytime you invoke factorial scaling.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      That's nice, fast, and relatively simple!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1019812]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2014-08-02 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (53 votes), past polls