http://www.perlmonks.org?node_id=1019890


in reply to Re: How can one generate all possible combinations with reference to string positions?
in thread How can one generate all possible combinations with reference to string positions?

Hi BrowserUk,

Thanks for your query. In fact I have come across a good research paper on genetics where the authors have changed one synonymous codon with another at a particular position i.e. they want to keep the same amino acid sequence in the polypeptide chain but have generated the Maximal Entropy Distribution from a set of sequences obtained by reshuffling the primary coding sequence. These sequences have been used to find the motifs in coding sequence of genes. To generate those set of sequences, I need to take help of a perl script. Actually there will be 20 major groups (i.e. 20 amino acids) and each group will have synonymous number of codons (except methionine & tryptophan). The authors have indicated that this kind of genetic study might be useful in improving the HIV vaccine. That is why I am interested to generate the set of sequences and learn it for the benefit of the students of our department. This is just out of my curiosity.

Ref Paper:

Harlan Robins,Michael Krasnitz and Arnold J Levine. 2008. The Computational Detection of Functional Nucleotide Sequence Motifs in the Coding Regions of Organisms. Experimental Biology & Medicine 233:665-673. (Open Access)

Replies are listed 'Best First'.
Re^3: How can one generate all possible combinations with reference to string positions?
by BrowserUk (Patriarch) on Feb 21, 2013 at 05:39 UTC

    First. You haven't answered my question: What do you start with in terms of the groups and positions. But forget that for now.

    Actually there will be 20 major groups (i.e. 20 amino acids) and each group will have synonymous number of codons (except methionine & tryptophan).

    That also doesn't tell us how many groups and how many positions for each group you are hoping to deal with, so for sake of somewhere to start I'm going to use your description from the OP.

    If the string contains, say 500 positions with 10 main groups and each group with varying number of elements,

    The way to tackle the problem would be to permute the values in each group independently, then permute each permutation from each group ,against all the permutations from the second group, against all the permutations from the third group; ... and so one. And then substitute the ordered permutations back into the original positions in the composite string.

    But here is the problem

    For the sake of something concrete, let's take that to mean you have 10 groups with 50 positions per group in your 500 position string.

    Each of those 10 groups has 50! permutations: 1*2*3*...*50 = 30414093201714379654605756361199494046490000000000000000000000000 permutations.

    And then you have all the combinations of each of those permutations with each of the (same huge number of) permutations in each of the other groups.

    So your final number of strings is going to be something like 50!**10 =

    6772495493499517441359753377384791035003696881462265580065594970076169 0210201881472400371909805448588932150215198817383799947594646432036237 1232749289013396400770111224550158125609245464425919104497462521652368 8063199046571891351816309749124049098611846130196482679268880617008205 7348631165353328233550287957173141094502004333604168254654323630280443 0686038713119389885652930107338333163364660010000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000 000000000000000

    That number is so big that if you could print 1 per nanosecond; and had as many printers as there are atoms in the universe running day & night, you wouldn't finish before the universe ceased to exist due to heat death.

    And that's only 10 amino acids and you have 20; and only 500 characters, and genomes are millions or billions in length.

    Bottom line. You cannot tackle this using brute force.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.