http://www.perlmonks.org?node_id=108469

The other day I was trying to find a way to find all possible combination of 2-4 particular letters of the alphabet (A, C, G, N, T), with repetition. The objective was to try to build an associative array of letter combinations and the corresponding ordered string. (See Sorting characters within a string).

Generating all combinations would usually takes a cartesian cross-product to be efficient (see japhy's post about that).

I came up with a simple but (very) inefficient method of doing the same thing:
my @strings = (grep /[acgmt]{2}/, ('aa' .. 'tt'), grep /[acgmt]{3}/, ('aaa' .. 'ttt'), grep /[acgmt]{4}/, ('aaaa' .. 'tttt'));
This yields exactly what I want, but to the price of a high cost in memory and processing. The problem is with the .. operator. The number of elements generated before the grep by the .. operator is 361,173. After the grep you are left with only 774 "useful" strings. And if you wanted to get all strings up to 5 letters you would then have to deal with nearly 10,000,000 elements. And things only get worse.

I then began to wonder about the availability of a lazy form of evaluation for expression. For the uninitiated, lazy evaluation of an expression is delaying evaluation until the last moment, at which point you may realize that you only need to compute a part of the expression. The aim here would be to avoid storing the intermediary result of the .. operator in memory by applying the grep directly, as a filter function.

Well, programmers of Perl, rejoice. It seems Perl6 might have a lazy operator (see rfc123) that will allow to do exactly this. With the lazy operator, we could rewrite the previous piece of code like this:
my @strings = (grep /[acgmt]{2}/, lazy ('aa' .. 'tt'), grep /[acgmt]{3}/, lazy ('aaa' .. 'ttt'), grep /[acgmt]{4}/, lazy ('aaaa' .. 'tttt'));
This would causes a lazy list to be passed to the filter function grep, saving us from allocating the entire letter combinations array in memory. While this might not be the greatest example ever, I think it's simple enough to illustrate the possibilities.

Lazy evaluation is (to the best of my knowledge) mostly available in functionnal programming language. It is a powerful concept that can only reinforce the variety of Perl idioms you can use to easily solve complex problems.

So everything is good, Perl6 will allow us to be lazy and use lazy. I can't wait to see the impatience and hubris functions ;-)

Guillaume

Replies are listed 'Best First'.
Re: Let's get lazy
by japhy (Canon) on Aug 28, 2001 at 20:47 UTC
    I'd suggest creating a function (or an object) that returns the next term in your requested series. The secret to laziness is an iterator.
    { my $init = 'aa'; my $start = 'a'; my $len = length $init; sub next_term { my $ret = $init; my $p = $len - 1; while (1) { substr($init, $p, 1) =~ tr/acgnt/cgnta/; last if substr($init, $p--, 1) ne $start; $init = "$start$init", $len++, last if $p < 0; } return $ret; } }
    You could also mimic this with a tied scalar, and have the FETCH function do what I have done above.
    package Tie::ScalarIter; sub TIESCALAR { my ($class, $init, $start) = @_; bless [ $init, $start, length($init) ], $class; } sub FETCH { my $s = shift; my $ret = $s->[0]; my $p = $s->[2] - 1; while (1) { substr($s->[0], $p, 1) =~ tr/acgnt/cgnta/; last if substr($s->[0], $p--, 1) ne $s->[1]; $s->[0] = "$s->[1]$s->[0]", $s->[2]++, last if $p < 0; } return $ret; } sub STORE { my $s = shift; my $rep = shift; if (ref $rep) { @$s = (@$rep, length $rep->[0]) } else { (@$s[0,2] = ($rep, length $rep) } } 1;
    Both interfaces are simple to use:
    # functional my @strings; while (defined (my $next = next_term())) { last if length($next) > 4; push @strings, $next; } # tied scalar use Tie::ScalarIter; tie my($iter), 'Tie::ScalarIter', 'aa', 'a'; my @strings; while (defined (my $next = $iter)) { last if length($next) > 4; push @strings, $next; }
    And that's about it.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Let's get lazy
by Masem (Monsignor) on Aug 28, 2001 at 20:37 UTC
    Why wait for Perl 6? :-)
    use strict; use Language::Functional ':all'; sub match { $_[0] =~ m/^[acgmt]*$/ } my $x = filter( \&match, ['aa'..'tt', 'aaa'..'ttt','aaaa'..'tttt'] ); print show $x;
    Update - not quite like it, however, perl will still generate that list, but I'm looking at L::F and there's enough there that you can generate the lazy list for processing via double Map's.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    It's not what you know, but knowing how to find it if you don't know that's important

(MeowChow) Re: Let's get lazy
by MeowChow (Vicar) on Aug 28, 2001 at 22:10 UTC
    If memory use is your only concern, then here's a rehash (har har har) of one of my prior atrocities...
      
    $_ = 'acgnt' x 4; push @strings, "$1$2" while /.*(.).*(.)(?(?{$_{"$1$2"}++})^)/; push @strings, "$1$2$3" while /.*(.).*(.).*(.)(?(?{$_{"$1$2$3"}++})^ +)/; push @strings, "$1$2$3$4" while /.*(.).*(.).*(.).*(.)(?(?{$_{"$1$2$3$4 +"}++})^)/; print join $/, @strings;
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
Re: Let's get lazy
by dga (Hermit) on Aug 28, 2001 at 23:05 UTC

    I posted a reply to the same post.

    I was thinking a really lazy way to do this would be to make up a locale with just the ACGNT letters and then have to magic autoincrement operator just do the Right Thing.

    use POSIX qw(locale_h); setlocale(LC_ALL, "DNA"); use locale; my @twos=(AA..TT); #with all above working @twos would be AA AC AG AN AT CA CC CG etc.

    So my question for the folks in the know is does the autoincrementer use locale?
    Has anyone defined a DNA locale?

Re: Let's get lazy
by runrig (Abbot) on Nov 02, 2001 at 04:31 UTC
    tilly once posted something that could be coerced to do this.

    Update: It would be kind of the inside-out solution to this problem, you could get a function callback for every combination of values, but I think to get an actual iterator function using this method, similar to the iterator produced by my $iter = do { my $i; sub { $i++ } };, you'd need a co-routine. Unless tilly can wrangle it out of his code :-)

    There does seem to be a Coroutine Module on CPAN, might be fun to look into :)

    Update: Taking the initiative, I wrangled tilly's code myself (just couldn't wait for the book :)

    use strict; use warnings; my $iterator = mk_iter( [1..2], ["a".."c"], [3..5] ); while (my @arr = $iterator->()) { print "@arr\n"; } sub mk_iter { my $range = shift; my $i = 0; my $end = @$range; my $iter = sub { return unless $i < $end; return $$range[$i++]; }; @_ ? ret_iter($iter, @_) : $iter; } sub ret_iter { my $iter = shift; my $range = shift; my $i = 0; my $end = @$range; my @arr; my $new_iter = sub { $i = 0 unless $i < $end; return unless $i or @arr = $iter->(); return @arr, $$range[$i++]; }; @_ ? ret_iter($new_iter, @_) : $new_iter; } ##################################### # Update # Here's a variation which is closer to what was # Originally asked for, i.e. all combinations from # 2-4 characters use strict; use warnings; use strict; use warnings; my $iterator = make_iter( 2,4,[qw(A C G N T)] ); while (my @arr = $iterator->()) { print "@arr\n"; } sub make_iter { my ($start, $end, $range) = @_; my $nxt_iter = sub { return }; my $iter = sub { my @data; unless (@data = $nxt_iter->()) { return unless $start <= $end; $nxt_iter = mk_iter( ($range) x $start++); return $nxt_iter->(); }; @data; } }
      Here is a more general solution to the problem. In this solution my iterators all indicate that they are done by returning an empty list, and when called next will restart. I didn't use i_grep, but I included it to show how you would do it.

      Note that despite the length, the code is more straightforward than the original recursive code. And if this was part of a longer program, you would be reusing the bulk of this code.

      use strict; my $iter = i_map( sub {print "@_\n"}, comb_iter( list_iter(1..2), list_iter('a'..'c'), list_iter(3..5) ) ); 1 while $iter->(); ################################################################### # The program proper ends here. These are utility functions that # # you could reuse # ################################################################### # Takes a list of iterators that are "restartable" # Returns a restartable iterator that iterates over all combinations # of outputs of the input iterators, creating a flat list of combinati +ons # of the inputs. (The output only makes sense in array context.) sub comb_iter { if (0 == @_) { return sub {()}; # Stupid case needed for generality. } elsif (1 == @_) { return shift; } else { my $outer_iter = shift; my $inner_iter = comb_iter(@_); my @last_outer; return sub { if (@last_outer) { my @ret = $inner_iter->(); if (@ret) { return (@last_outer, @ret); } else { @last_outer = $outer_iter->(); if (@last_outer) { return (@last_outer, $inner_iter->()); } else { return (); } } } else { @last_outer = $outer_iter->(); return (@last_outer, $inner_iter->()); } }; } } # Takes a function and an iterator, returns an iterator that uses that # function to filter the output. sub i_grep { my ($filter, $iter) = @_; my @last_ret = qw(just an initialization value); sub { while (@last_ret) { @last_ret = $iter->(); if ($filter->(@last_ret)) { return wantarray ? @last_ret : $last_ret[0]; } } return (); }; } # Takes a function and an iterator, returns an iterator that applies t +hat # function to the returns of the iterator. sub i_map { my ($filter, $iter) = @_; sub { my @ret = $iter->(); return @ret ? $filter->(@ret) : (); }; } # Takes a list and turns it into an iterator over that list sub list_iter { my @vals = @_; my $i = 0; sub { if ($i < @vals) { return $vals[$i++]; } else { $i = 0; return (); } }; }
      Note that the specific problem in the original question can now be solved as the author wanted using i_grep, or you can produce more efficient iterator as follows:
      my $genome_iter = i_map( sub {join '', @_}, join_iter( map { comb_iter( map { list_iter(qw(a c g n t)); } 1..$_ ) } 2..3 ) ); while (my $string = $genome_iter->()) { print "$string\n"; } # Takes a list of iterators, and returns an iterator that iterates # over each in turn sub join_iter { my @iter = @_; my $i = 0; return sub { while ($i < @iter) { my @ret = $iter[$i]->(); if (@ret) { return wantarray ? @ret : $ret[0]; } else { $i++; } } $i = 0; return (); }; }
      Alternately if you want to turn the output into a list you can just create an easy method:
      # Takes an iterator and returns a list of results sub iter2list { my $iter = shift; my @out; while (my @ret = $iter->()) { push @out, @ret; } return @out; }
      Note that most of the length here is because I am having to build my iterator interface from scratch. That is a lot of work! And some of the code looks more complex because what we are used to seeing in a few nested loops our minds balk at when you see it as a similar number of nested calls.
      A much better person to ask is Dominus, he is in the process of writing a book on exactly this subject.

      Chapter 4 in particular offers direct iterative solutions of the above problem. Once you know the techniques, they are straightforward to apply in any language with proper support for closures. And the techniques are essentially to create utility functions that take one iterator and create new ones out of it. For instance if you like doing stuff procedurally with map, grep, an easy way to produce a range etc, then you can create iterative versions of the same. (For instance a range would return all of the things in that range. An iterative map would take a function and an iterator and give you an iterator that is the result of applying that function to the output of the first iterator.) And then it becomes a mechanical process to write iterative versions of what you can dream up in a list-oriented manner.

      Alternately if you want the flavour of a co-routine solution to the problem, and don't want to wait for Perl 6, Ruby offers them now. So, I believe, does the very latest version of Python. (Ruby goes further and people there use them more often.) They don't offer the rest of Perl 6, but they give you the flavour of a Perlish scripting language with yield and (with Ruby or stackless Python) full continuations.

Re: Let's get lazy
by beretboy (Chaplain) on Nov 02, 2001 at 03:36 UTC
    May not be exactly what you are looking for but I wrote this peice of code as a way to find (through brute force) all possible combinations of something (in this case locker combinations, don't ask why). anyway here it is:
    #All possible combination generator use strict; use Quantum::Superpositions; open (DUMP, ">combos.txt"); my @list; my $i = 0; while (1) { my $num = int(rand(31)) . "-" . int(rand(31)) . "-" . int(rand(31)); if ($num eq any(@list)) { next; } else { @list[$i] = $num; print "$num "; print DUMP "$num\n"; $i++; } }


    "Sanity is the playground of the unimaginative" -Unknown
      Since I don't see the glob trick mentioned in this thread, and I'm not too keen on your pick-random-combos-until-we-think-weve-got-them-all solution, here are two ways to solve the locker combo problem.
      #!/usr/bin/perl -wT use strict; # golfed version using glob my $nums = join(',',1..31); my @list1 = glob("{$nums}-{$nums}-{$nums}"); print "L1: $_\n" for @list1; # more sane way of doing that..... my @list2; for my $i (1..31) { for my $j (1..31) { for my $k (1..31) { push(@list2,"$i-$j-$k"); } } } print "L2: $_\n" for @list2;
      And here is the glob trick used to solve the original problem...
      #!/usr/bin/perl -wT use strict; # one ordering.... my $letters = ',A,C,G,N,T'; my @list1 = grep {/./} glob("{$letters}"x4); print "L1: $_\n" for @list1; # different ordering my $letter2 = 'A,C,G,N,T'; my @list2 = (glob("{$letter2}"x2),glob("{$letter2}"x3),glob("{$letter2 +}"x4)); print "L2: $_\n" for @list2;

      -Blake