http://www.perlmonks.org?node_id=991276

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,

I am a beginner in perl programming. I am interested in generating all possible combinations of 2-letter & 3-letter words from a set of four letters (A,T,G,C), choosing one letter at a time for each position. I searched for perl scripts in CPAN and other online sources but I didn't get the codes. But a few scripts are indeed available for permutations and combinations. If the size of the word is small, it's easy to do it manually but when the size of the word is 10 or more it is time-consuming to generate all the combinations. If any perl code is available, it will be very useful to the biologists. If 'n' is the size of the word, then 4**n (i.e. 4 to the power word size) combinations are possible unlike possible permutations and combinations. For 2-letter words, there will be 16 combinations as given below (unlike 12 permutations and 6 combinations) and for 3-letter words 64 combinations are possible in the same way (unlike fewer permutations and combinations). May I request perl monks to suggest me some reference reading material or codes for this purpose?

#!/usr/bin/perl use warnings; use strict; ## Perl script to create all possible combinations of A,T,G & C # of varying lengths: print"\n This program will generate all possible combinations of A,T,G + & C:"; print"\n Enter the length of words you want (say 2 or 3 etc.): "; my $num=<STDIN>;my $four=4; chomp $num; my $no_combi=$four**$num; print"\n Total Number of Combinations= $no_combi\n"; my $combi_2l=". . .????"; my $combi_3l=". . .????"; print"\n All possible combinations of 2-letter words: ???$combi_2l\n"; print"\n All possible combinations of 3-letter words: ???$combi_3l\n"; exit;

The expected results will look like:

All possible combinations of 2-letter words (16): AA AT AG AC TA TT TG TC GA GT GG GC CA CT CG CC

For 3-letter words, there will be 64 combinations like:

All possible combinations of 3-letter words: AAA ... ... ... CCC
  • Comment on Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
  • Select or Download Code

Replies are listed 'Best First'.
Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by BrowserUk (Patriarch) on Sep 02, 2012 at 14:31 UTC

    my $r = join',','a'..'z'; print for glob "{$r}{$r}";; print for glob "{$r}{$r}{$r}";;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by CountZero (Bishop) on Sep 02, 2012 at 15:43 UTC
    Change
    my $r = join',','a'..'z';
    to
    my $r = 'A,T,G,C';
    in BrowserUK's example and it will solve your problem.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by Khen1950fx (Canon) on Sep 02, 2012 at 16:31 UTC
    Set::CrossProduct.
    #!/usr/bin/perl use autodie; use common::sense; use Set::CrossProduct; my $min = 2; my $max = 3; my $set = [ qw(A T G C) ]; foreach my $length ( $min .. $max ) { say "Getting combinations of length $length"; my $cross = Set::CrossProduct->new( [ ( $set ) x $length ] ); while( my $tuple = $cross->get ) { say join '', @$tuple; } }
Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by Athanasius (Archbishop) on Sep 02, 2012 at 17:31 UTC

    Although glob performs well for small word sizes, for larger sizes a hand-rolled loop is much faster:

    #! perl use strict; use warnings; use Benchmark qw(cmpthese); cmpthese(1, { 'loop' => sub { my @p = permute_loop(10); print 'loop: ', scalar @ +p, "\n"; }, 'glob' => sub { my @q = permute_glob(10); print 'glob: ', scalar @ +q, "\n"; }, }); sub permute_glob { my ($size) = @_; my $r = 'A,T,G,C'; return glob "{$r}" x $size; } sub permute_loop { my ($size) = @_; my @a = qw(A T G C); while (--$size) { @a = map { $_ . 'A', $_ . 'T', $_ . 'G', $_ . 'C' } @a; } return @a; }

    Output:

    glob: 1048576 (warning: too few iterations for a reliable count) loop: 1048576 (warning: too few iterations for a reliable count) s/iter glob loop glob 281 -- -98% loop 4.60 6011% --

    Hope that helps,

    Athanasius <°(((><contra mundum

Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by AnomalousMonk (Archbishop) on Sep 02, 2012 at 20:45 UTC

    Further to Khen1950fx's reply: Dominus's excellent Higher-Order Perl (freely downloadable here) discusses general iterative approaches to solving combination/permutation problems, with a bio-application example in section 4.3.2 "Genomic Sequence Generator".

Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by jandrew (Chaplain) on Sep 02, 2012 at 23:38 UTC

    I humbly submit Math::Fleximal. It may be that generating the array is only the first step in your desired outcome. If you treat the created array as a number it allows for direct calls to each position. Using this method the definition of the array is flexible and quick.

Re: Is it possible to generate all possible combinations of 2-letter and 3-letter words in perl?
by pvaldes (Chaplain) on Sep 03, 2012 at 09:49 UTC

    Now available with uracil flavour

    use strict; use warnings; my @bas = ("A","G","C","T","U"); my @all = (); for my $a1(@bas){ for my $a2(@bas){ for my $a3(@bas){ push @all, "$a1$a2$a3\n"}} my @codon = grep !/(U.*T|T.*U)/, @all; for my $valid(@codon){ print $valid } }

    EDIT: improved, codons with T + U off