Let's get lazy

The other day I was trying to find a way to find all possible combination of 2-4 particular letters of the alphabet (A, C, G, N, T), with repetition. The objective was to try to build an associative array of letter combinations and the corresponding ordered string. (See Sorting characters within a string).

Generating all combinations would usually takes a cartesian cross-product to be efficient (see japhy's post about that).

I came up with a simple but (very) inefficient method of doing the same thing:

my @strings = (grep /[acgmt]{2}/, ('aa' .. 'tt'), 
               grep /[acgmt]{3}/, ('aaa' .. 'ttt'), 
               grep /[acgmt]{4}/, ('aaaa' .. 'tttt'));
[download]

This yields exactly what I want, but to the price of a high cost in memory and processing. The problem is with the .. operator. The number of elements generated before the grep by the .. operator is 361,173. After the grep you are left with only 774 "useful" strings. And if you wanted to get all strings up to 5 letters you would then have to deal with nearly 10,000,000 elements. And things only get worse.

I then began to wonder about the availability of a lazy form of evaluation for expression. For the uninitiated, lazy evaluation of an expression is delaying evaluation until the last moment, at which point you may realize that you only need to compute a part of the expression. The aim here would be to avoid storing the intermediary result of the .. operator in memory by applying the grep directly, as a filter function.

Well, programmers of Perl, rejoice. It seems Perl6 might have a lazy operator (see rfc123) that will allow to do exactly this. With the lazy operator, we could rewrite the previous piece of code like this:

my @strings = (grep /[acgmt]{2}/, lazy ('aa' .. 'tt'), 
               grep /[acgmt]{3}/, lazy ('aaa' .. 'ttt'), 
               grep /[acgmt]{4}/, lazy ('aaaa' .. 'tttt'));
[download]

This would causes a lazy list to be passed to the filter function grep, saving us from allocating the entire letter combinations array in memory. While this might not be the greatest example ever, I think it's simple enough to illustrate the possibilities.

Lazy evaluation is (to the best of my knowledge) mostly available in functionnal programming language. It is a powerful concept that can only reinforce the variety of Perl idioms you can use to easily solve complex problems.

So everything is good, Perl6 will allow us to be lazy and use lazy. I can't wait to see the impatience and hubris functions ;-)

Guillaume

Comment on Let's get lazy Select or Download Code

Replies are listed 'Best First'.

Re: Let's get lazy
by japhy (Canon) on Aug 28, 2001 at 20:47 UTC

{
  my $init = 'aa';
  my $start = 'a';
  my $len = length $init;

  sub next_term {
    my $ret = $init;
    my $p = $len - 1;

    while (1) {
      substr($init, $p, 1) =~ tr/acgnt/cgnta/;
      last if substr($init, $p--, 1) ne $start;
      $init = "$start$init", $len++, last if $p < 0;
    }

    return $ret;
  }
}
[download]

FETCH

package Tie::ScalarIter;

sub TIESCALAR {
  my ($class, $init, $start) = @_;
  bless [ $init, $start, length($init) ], $class;
}

sub FETCH {
  my $s = shift;

  my $ret = $s->[0];
  my $p = $s->[2] - 1;

  while (1) {
    substr($s->[0], $p, 1) =~ tr/acgnt/cgnta/;
    last if substr($s->[0], $p--, 1) ne $s->[1];
    $s->[0] = "$s->[1]$s->[0]", $s->[2]++, last if $p < 0;
  }

  return $ret;
}

sub STORE {
  my $s = shift;
  my $rep = shift;

  if (ref $rep) { @$s = (@$rep, length $rep->[0]) }
  else { (@$s[0,2] = ($rep, length $rep) }
}

1;
[download]

# functional

my @strings;
while (defined (my $next = next_term())) {
  last if length($next) > 4;
  push @strings, $next;
}


# tied scalar
use Tie::ScalarIter;

tie my($iter), 'Tie::ScalarIter', 'aa', 'a';
my @strings;

while (defined (my $next = $iter)) {
  last if length($next) > 4;
  push @strings, $next;
}
[download]

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

[reply]
[d/l]
[select]

Re: Let's get lazy
by Masem (Monsignor) on Aug 28, 2001 at 20:37 UTC

use strict;
use Language::Functional ':all';

sub match { $_[0] =~ m/^[acgmt]*$/ }

my $x = filter( \&match, ['aa'..'tt', 'aaa'..'ttt','aaaa'..'tttt'] );
print show $x;
[download]

Update

-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
It's not what you know, but knowing how to find it if you don't know that's important

[reply]
[d/l]

(MeowChow) Re: Let's get lazy
by MeowChow (Vicar) on Aug 28, 2001 at 22:10 UTC

$_ = 'acgnt' x 4;
push @strings, "$1$2"     while /.*(.).*(.)(?(?{$_{"$1$2"}++})^)/;
push @strings, "$1$2$3"   while /.*(.).*(.).*(.)(?(?{$_{"$1$2$3"}++})^
+)/;
push @strings, "$1$2$3$4" while /.*(.).*(.).*(.).*(.)(?(?{$_{"$1$2$3$4
+"}++})^)/;

print join $/, @strings;
[download]

   MeowChow                                   
               s aamecha.s a..a\u$&owag.print

[reply]
[d/l]

Re: Let's get lazy
by dga (Hermit) on Aug 28, 2001 at 23:05 UTC

I posted a reply to the same post.

I was thinking a really lazy way to do this would be to make up a locale with just the ACGNT letters and then have to magic autoincrement operator just do the Right Thing.

use POSIX qw(locale_h);
setlocale(LC_ALL, "DNA");
use locale;

my @twos=(AA..TT);
#with all above working @twos would be AA AC AG AN AT CA CC CG etc.
[download]

So my question for the folks in the know is does the autoincrementer use locale?
Has anyone defined a DNA locale?

[reply]
[d/l]

Re: Let's get lazy
by runrig (Abbot) on Nov 02, 2001 at 04:31 UTC

tilly

something

Update: It would be kind of the inside-out solution to this problem, you could get a function callback for every combination of values, but I think to get an actual iterator function using this method, similar to the iterator produced by my $iter = do { my $i; sub { $i++ } };, you'd need a co-routine. Unless tilly can wrangle it out of his code :-)

There does seem to be a Coroutine Module on CPAN, might be fun to look into :)

Update: Taking the initiative, I wrangled tilly's code myself (just couldn't wait for the book :)


use strict;
use warnings;

my $iterator = mk_iter(
  [1..2], ["a".."c"], [3..5]
);

while (my @arr = $iterator->()) {
 print "@arr\n";
}

sub mk_iter {
  my $range = shift;
  my $i = 0;
  my $end = @$range;
  my $iter = sub {
    return unless $i < $end;
    return $$range[$i++];
  };
  @_ ? ret_iter($iter, @_) : $iter;
}

sub ret_iter {
  my $iter = shift;
  my $range = shift;
  my $i = 0;
  my $end = @$range;
  my @arr;
  my $new_iter = sub {
    $i = 0 unless $i < $end;
    return unless $i or @arr = $iter->();
    return @arr, $$range[$i++];
  };
  @_ ? ret_iter($new_iter, @_) : $new_iter;
}

#####################################
# Update
# Here's a variation which is closer to what was
# Originally asked for, i.e. all combinations from
# 2-4 characters
use strict;
use warnings;

use strict;
use warnings;

my $iterator = make_iter(
 2,4,[qw(A C G N T)]
);

while (my @arr = $iterator->()) {
 print "@arr\n";
}

sub make_iter {
  my ($start, $end, $range) = @_;
  my $nxt_iter = sub { return };
  my $iter = sub {
    my @data;
    unless (@data = $nxt_iter->()) {
        return unless $start <= $end;
        $nxt_iter = mk_iter( ($range) x $start++);
        return $nxt_iter->();
    };
    @data;
  }
}
[download]

[reply]
[d/l]
[select]

Laziness in a more consistent way

by tilly (Archbishop) on Nov 03, 2001 at 02:50 UTC

Note that despite the length, the code is more straightforward than the original recursive code. And if this was part of a longer program, you would be reusing the bulk of this code.

use strict;

my $iter = i_map(
  sub {print "@_\n"},
  comb_iter(
    list_iter(1..2), list_iter('a'..'c'), list_iter(3..5)
  )
);

1 while $iter->();

###################################################################
# The program proper ends here.  These are utility functions that #
# you could reuse                                                 #
###################################################################

# Takes a list of iterators that are "restartable"
# Returns a restartable iterator that iterates over all combinations
# of outputs of the input iterators, creating a flat list of combinati
+ons
# of the inputs.  (The output only makes sense in array context.)
sub comb_iter {
  if (0 == @_) {
    return sub {()}; # Stupid case needed for generality.
  }
  elsif (1 == @_) {
    return shift;
  }
  else {
    my $outer_iter = shift;
    my $inner_iter = comb_iter(@_);
    my @last_outer;
    return sub {
      if (@last_outer) {
        my @ret = $inner_iter->();
        if (@ret) {
          return (@last_outer, @ret);
        }
        else {
          @last_outer = $outer_iter->();
          if (@last_outer) {
            return (@last_outer, $inner_iter->());
          }
          else {
            return ();
          }
        }
      }
      else {
        @last_outer = $outer_iter->();
        return (@last_outer, $inner_iter->());
      }
    };
  }
}


# Takes a function and an iterator, returns an iterator that uses that
# function to filter the output.
sub i_grep {
  my ($filter, $iter) = @_;
  my @last_ret = qw(just an initialization value);
  sub {
    while (@last_ret) {
      @last_ret = $iter->();
      if ($filter->(@last_ret)) {
        return wantarray ? @last_ret : $last_ret[0];
      }
    }
    return ();
  };
}


# Takes a function and an iterator, returns an iterator that applies t
+hat
# function to the returns of the iterator.
sub i_map {
  my ($filter, $iter) = @_;
  sub {
    my @ret = $iter->();
    return @ret ? $filter->(@ret) : ();
  };
}

# Takes a list and turns it into an iterator over that list
sub list_iter {
  my @vals = @_;
  my $i = 0;
  sub {
    if ($i < @vals) {
      return $vals[$i++];
    }
    else {
      $i = 0;
      return ();
    }
  };
}
[download]

my $genome_iter = i_map(
  sub {join '', @_},
  join_iter(
    map {
      comb_iter(
        map {
          list_iter(qw(a c g n t));
        } 1..$_
      )
    } 2..3
  )
);

while (my $string = $genome_iter->()) {
  print "$string\n";
}



# Takes a list of iterators, and returns an iterator that iterates
# over each in turn
sub join_iter {
  my @iter = @_;
  my $i = 0;
  return sub {
    while ($i < @iter) {
      my @ret = $iter[$i]->();
      if (@ret) {
        return wantarray ? @ret : $ret[0];
      }
      else {
        $i++;
      }
    }
    $i = 0;
    return ();
  };
}
[download]

# Takes an iterator and returns a list of results
sub iter2list {
  my $iter = shift;
  my @out;
  while (my @ret = $iter->()) {
    push @out, @ret;
  }
  return @out;
}
[download]

[reply]
[d/l]
[select]

Re (tilly) 2: Let's get lazy

by tilly (Archbishop) on Nov 02, 2001 at 17:07 UTC

Dominus

book

Chapter 4 in particular offers direct iterative solutions of the above problem. Once you know the techniques, they are straightforward to apply in any language with proper support for closures. And the techniques are essentially to create utility functions that take one iterator and create new ones out of it. For instance if you like doing stuff procedurally with map, grep, an easy way to produce a range etc, then you can create iterative versions of the same. (For instance a range would return all of the things in that range. An iterative map would take a function and an iterator and give you an iterator that is the result of applying that function to the output of the first iterator.) And then it becomes a mechanical process to write iterative versions of what you can dream up in a list-oriented manner.

Alternately if you want the flavour of a co-routine solution to the problem, and don't want to wait for Perl 6, Ruby offers them now. So, I believe, does the very latest version of Python. (Ruby goes further and people there use them more often.) They don't offer the rest of Perl 6, but they give you the flavour of a Perlish scripting language with yield and (with Ruby or stackless Python) full continuations.

[reply]

Re: Let's get lazy
by beretboy (Chaplain) on Nov 02, 2001 at 03:36 UTC


#All possible combination generator

use strict;
use Quantum::Superpositions;
open (DUMP, ">combos.txt");
my @list;
my $i = 0;
while (1) {
my $num = int(rand(31)) . "-" . int(rand(31)) . "-" . int(rand(31));

if ($num eq any(@list)) {
next;
} else {
@list[$i] = $num;
print "$num ";
print DUMP "$num\n";
$i++;
}

}
[download]

[reply]
[d/l]

Re: Re: Let's get lazy

by blakem (Monsignor) on Nov 02, 2001 at 04:26 UTC

#!/usr/bin/perl -wT
use strict;

# golfed version using glob
my $nums = join(',',1..31);
my @list1 = glob("{$nums}-{$nums}-{$nums}");
print "L1: $_\n" for @list1;

# more sane way of doing that.....
my @list2;
for my $i (1..31) {
  for my $j (1..31) {
    for my $k (1..31) {
      push(@list2,"$i-$j-$k");
    }
  }
}
print "L2: $_\n" for @list2;
[download]

#!/usr/bin/perl -wT
use strict;

# one ordering....
my $letters = ',A,C,G,N,T';
my @list1 = grep {/./} glob("{$letters}"x4);
print "L1: $_\n" for @list1;

# different ordering
my $letter2 = 'A,C,G,N,T';
my @list2 = (glob("{$letter2}"x2),glob("{$letter2}"x3),glob("{$letter2
+}"x4));
print "L2: $_\n" for @list2;
[download]

-Blake

[reply]
[d/l]
[select]

Back to Meditations