The other day I was trying to find a way to find all possible combination of
2-4 particular letters of the alphabet (A, C, G, N, T), with repetition. The objective was
to try to build an associative array of letter combinations and the corresponding
ordered string. (See Sorting characters within a string).
Generating all combinations would usually takes a cartesian cross-product to be
efficient (see japhy's post
about that).
I came up with a simple but (very) inefficient method of doing the same thing:
my @strings = (grep /[acgmt]{2}/, ('aa' .. 'tt'),
grep /[acgmt]{3}/, ('aaa' .. 'ttt'),
grep /[acgmt]{4}/, ('aaaa' .. 'tttt'));
This yields exactly what I want, but to the price of a high cost in memory and processing.
The problem is with the .. operator. The number of elements generated before the
grep by the .. operator is 361,173. After the grep you are left
with only 774 "useful" strings. And if you wanted to get all strings up to 5 letters you
would then have to deal with nearly 10,000,000 elements. And things only get worse.
I then began to wonder about the availability of a lazy form of evaluation for expression.
For the uninitiated, lazy evaluation of an expression is delaying evaluation until the
last moment, at which point you may realize that you only need to compute a part of the expression.
The aim here would be to avoid storing the intermediary result of the .. operator in memory
by applying the grep directly, as a filter function.
Well, programmers of Perl, rejoice. It seems Perl6 might have a lazy operator (see
rfc123) that will allow to do exactly this.
With the lazy operator, we could rewrite the previous piece of code like this: my @strings = (grep /[acgmt]{2}/, lazy ('aa' .. 'tt'),
grep /[acgmt]{3}/, lazy ('aaa' .. 'ttt'),
grep /[acgmt]{4}/, lazy ('aaaa' .. 'tttt'));
This would causes a lazy list to be passed to the filter function grep, saving
us from allocating the entire letter combinations array in memory. While this might not be the greatest
example ever, I think it's simple enough to illustrate the possibilities.
Lazy evaluation is (to the best of my knowledge) mostly available in functionnal
programming language. It is a powerful concept that can only reinforce the variety
of Perl idioms you can use to easily solve complex problems.
So everything is good, Perl6 will allow us to be lazy and use lazy. I can't wait to
see the impatience and hubris functions ;-)
Guillaume
Re: Let's get lazy
by japhy (Canon) on Aug 28, 2001 at 20:47 UTC
|
I'd suggest creating a function (or an object) that returns the next term in your requested series. The secret to laziness is an iterator.
{
my $init = 'aa';
my $start = 'a';
my $len = length $init;
sub next_term {
my $ret = $init;
my $p = $len - 1;
while (1) {
substr($init, $p, 1) =~ tr/acgnt/cgnta/;
last if substr($init, $p--, 1) ne $start;
$init = "$start$init", $len++, last if $p < 0;
}
return $ret;
}
}
You could also mimic this with a tied scalar, and have the FETCH function do what I have done above.
package Tie::ScalarIter;
sub TIESCALAR {
my ($class, $init, $start) = @_;
bless [ $init, $start, length($init) ], $class;
}
sub FETCH {
my $s = shift;
my $ret = $s->[0];
my $p = $s->[2] - 1;
while (1) {
substr($s->[0], $p, 1) =~ tr/acgnt/cgnta/;
last if substr($s->[0], $p--, 1) ne $s->[1];
$s->[0] = "$s->[1]$s->[0]", $s->[2]++, last if $p < 0;
}
return $ret;
}
sub STORE {
my $s = shift;
my $rep = shift;
if (ref $rep) { @$s = (@$rep, length $rep->[0]) }
else { (@$s[0,2] = ($rep, length $rep) }
}
1;
Both interfaces are simple to use:
# functional
my @strings;
while (defined (my $next = next_term())) {
last if length($next) > 4;
push @strings, $next;
}
# tied scalar
use Tie::ScalarIter;
tie my($iter), 'Tie::ScalarIter', 'aa', 'a';
my @strings;
while (defined (my $next = $iter)) {
last if length($next) > 4;
push @strings, $next;
}
And that's about it.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] [select] |
Re: Let's get lazy
by Masem (Monsignor) on Aug 28, 2001 at 20:37 UTC
|
use strict;
use Language::Functional ':all';
sub match { $_[0] =~ m/^[acgmt]*$/ }
my $x = filter( \&match, ['aa'..'tt', 'aaa'..'ttt','aaaa'..'tttt'] );
print show $x;
Update - not quite like it, however, perl will still generate that list, but I'm looking at L::F and there's enough there that you can generate the lazy list for processing via double Map's.
-----------------------------------------------------
Dr. Michael K. Neylon - mneylon-pm@masemware.com
||
"You've left the lens cap of your mind on again, Pinky" - The Brain
It's not what you know, but knowing how to find it if you don't know that's important
| [reply] [d/l] |
(MeowChow) Re: Let's get lazy
by MeowChow (Vicar) on Aug 28, 2001 at 22:10 UTC
|
If memory use is your only concern, then here's a rehash (har har har) of one of my prior atrocities...
| | $_ = 'acgnt' x 4;
push @strings, "$1$2" while /.*(.).*(.)(?(?{$_{"$1$2"}++})^)/;
push @strings, "$1$2$3" while /.*(.).*(.).*(.)(?(?{$_{"$1$2$3"}++})^
+)/;
push @strings, "$1$2$3$4" while /.*(.).*(.).*(.).*(.)(?(?{$_{"$1$2$3$4
+"}++})^)/;
print join $/, @strings;
|
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [d/l] |
Re: Let's get lazy
by dga (Hermit) on Aug 28, 2001 at 23:05 UTC
|
I posted a reply to the same post.
I was thinking a really lazy way to do this would be to make up a locale with just the ACGNT letters and then have to magic autoincrement operator just do the Right Thing.
use POSIX qw(locale_h);
setlocale(LC_ALL, "DNA");
use locale;
my @twos=(AA..TT);
#with all above working @twos would be AA AC AG AN AT CA CC CG etc.
So my question for the folks in the know is does the autoincrementer use locale?
Has anyone defined a DNA locale?
| [reply] [d/l] |
Re: Let's get lazy
by runrig (Abbot) on Nov 02, 2001 at 04:31 UTC
|
tilly once posted something that could be coerced to do this.
Update: It would be kind of the inside-out solution to this problem, you could get a function callback for every combination of values, but I think to get an actual iterator function using this method, similar to the iterator produced by my $iter = do { my $i; sub { $i++ } };, you'd need a co-routine. Unless tilly can wrangle it out of his code :-)
There does seem to be a Coroutine Module on CPAN, might be fun to look into :)
Update: Taking the initiative, I wrangled tilly's code myself (just couldn't wait for the book :)
use strict;
use warnings;
my $iterator = mk_iter(
[1..2], ["a".."c"], [3..5]
);
while (my @arr = $iterator->()) {
print "@arr\n";
}
sub mk_iter {
my $range = shift;
my $i = 0;
my $end = @$range;
my $iter = sub {
return unless $i < $end;
return $$range[$i++];
};
@_ ? ret_iter($iter, @_) : $iter;
}
sub ret_iter {
my $iter = shift;
my $range = shift;
my $i = 0;
my $end = @$range;
my @arr;
my $new_iter = sub {
$i = 0 unless $i < $end;
return unless $i or @arr = $iter->();
return @arr, $$range[$i++];
};
@_ ? ret_iter($new_iter, @_) : $new_iter;
}
#####################################
# Update
# Here's a variation which is closer to what was
# Originally asked for, i.e. all combinations from
# 2-4 characters
use strict;
use warnings;
use strict;
use warnings;
my $iterator = make_iter(
2,4,[qw(A C G N T)]
);
while (my @arr = $iterator->()) {
print "@arr\n";
}
sub make_iter {
my ($start, $end, $range) = @_;
my $nxt_iter = sub { return };
my $iter = sub {
my @data;
unless (@data = $nxt_iter->()) {
return unless $start <= $end;
$nxt_iter = mk_iter( ($range) x $start++);
return $nxt_iter->();
};
@data;
}
}
| [reply] [d/l] [select] |
|
use strict;
my $iter = i_map(
sub {print "@_\n"},
comb_iter(
list_iter(1..2), list_iter('a'..'c'), list_iter(3..5)
)
);
1 while $iter->();
###################################################################
# The program proper ends here. These are utility functions that #
# you could reuse #
###################################################################
# Takes a list of iterators that are "restartable"
# Returns a restartable iterator that iterates over all combinations
# of outputs of the input iterators, creating a flat list of combinati
+ons
# of the inputs. (The output only makes sense in array context.)
sub comb_iter {
if (0 == @_) {
return sub {()}; # Stupid case needed for generality.
}
elsif (1 == @_) {
return shift;
}
else {
my $outer_iter = shift;
my $inner_iter = comb_iter(@_);
my @last_outer;
return sub {
if (@last_outer) {
my @ret = $inner_iter->();
if (@ret) {
return (@last_outer, @ret);
}
else {
@last_outer = $outer_iter->();
if (@last_outer) {
return (@last_outer, $inner_iter->());
}
else {
return ();
}
}
}
else {
@last_outer = $outer_iter->();
return (@last_outer, $inner_iter->());
}
};
}
}
# Takes a function and an iterator, returns an iterator that uses that
# function to filter the output.
sub i_grep {
my ($filter, $iter) = @_;
my @last_ret = qw(just an initialization value);
sub {
while (@last_ret) {
@last_ret = $iter->();
if ($filter->(@last_ret)) {
return wantarray ? @last_ret : $last_ret[0];
}
}
return ();
};
}
# Takes a function and an iterator, returns an iterator that applies t
+hat
# function to the returns of the iterator.
sub i_map {
my ($filter, $iter) = @_;
sub {
my @ret = $iter->();
return @ret ? $filter->(@ret) : ();
};
}
# Takes a list and turns it into an iterator over that list
sub list_iter {
my @vals = @_;
my $i = 0;
sub {
if ($i < @vals) {
return $vals[$i++];
}
else {
$i = 0;
return ();
}
};
}
Note that the specific problem in the original question can now be solved as the author wanted using i_grep, or you can produce more efficient iterator as follows:
my $genome_iter = i_map(
sub {join '', @_},
join_iter(
map {
comb_iter(
map {
list_iter(qw(a c g n t));
} 1..$_
)
} 2..3
)
);
while (my $string = $genome_iter->()) {
print "$string\n";
}
# Takes a list of iterators, and returns an iterator that iterates
# over each in turn
sub join_iter {
my @iter = @_;
my $i = 0;
return sub {
while ($i < @iter) {
my @ret = $iter[$i]->();
if (@ret) {
return wantarray ? @ret : $ret[0];
}
else {
$i++;
}
}
$i = 0;
return ();
};
}
Alternately if you want to turn the output into a list you can just create an easy method:
# Takes an iterator and returns a list of results
sub iter2list {
my $iter = shift;
my @out;
while (my @ret = $iter->()) {
push @out, @ret;
}
return @out;
}
Note that most of the length here is because I am having to build my iterator interface from scratch. That is a lot of work! And some of the code looks more complex because what we are used to seeing in a few nested loops our minds balk at when you see it as a similar number of nested calls. | [reply] [d/l] [select] |
|
A much better person to ask is Dominus, he is in the
process of writing a book
on exactly this subject.
Chapter 4 in particular offers direct iterative solutions
of the above problem. Once you know the techniques, they
are straightforward to apply in any language with proper
support for closures. And the techniques are essentially
to create utility functions that take one iterator and
create new ones out of it. For instance if you like
doing stuff procedurally with map, grep, an easy way to
produce a range etc, then you can create iterative
versions of the same. (For instance a range would return
all of the things in that range. An iterative map would
take a function and an iterator and give you an iterator
that is the result of applying that function to the output
of the first iterator.) And then it becomes a mechanical
process to write iterative versions of what you can dream
up in a list-oriented manner.
Alternately if you want the flavour of a co-routine solution
to the problem, and don't want to wait for Perl 6, Ruby
offers them now. So, I believe, does the very latest
version of Python. (Ruby goes further and people there
use them more often.) They don't offer the rest of Perl 6,
but they give you the flavour of a Perlish scripting
language with yield and (with Ruby or stackless Python)
full continuations.
| [reply] |
Re: Let's get lazy
by beretboy (Chaplain) on Nov 02, 2001 at 03:36 UTC
|
May not be exactly what you are looking for but I wrote this peice of code as a way to find (through brute force) all possible combinations of something (in this case locker combinations, don't ask why). anyway here it is:
#All possible combination generator
use strict;
use Quantum::Superpositions;
open (DUMP, ">combos.txt");
my @list;
my $i = 0;
while (1) {
my $num = int(rand(31)) . "-" . int(rand(31)) . "-" . int(rand(31));
if ($num eq any(@list)) {
next;
} else {
@list[$i] = $num;
print "$num ";
print DUMP "$num\n";
$i++;
}
}
"Sanity is the playground of the unimaginative"
-Unknown | [reply] [d/l] |
|
Since I don't see the glob trick mentioned in this thread, and I'm not too keen on your pick-random-combos-until-we-think-weve-got-them-all solution, here are two ways to solve the locker combo problem.
#!/usr/bin/perl -wT
use strict;
# golfed version using glob
my $nums = join(',',1..31);
my @list1 = glob("{$nums}-{$nums}-{$nums}");
print "L1: $_\n" for @list1;
# more sane way of doing that.....
my @list2;
for my $i (1..31) {
for my $j (1..31) {
for my $k (1..31) {
push(@list2,"$i-$j-$k");
}
}
}
print "L2: $_\n" for @list2;
And here is the glob trick used to solve the original problem...
#!/usr/bin/perl -wT
use strict;
# one ordering....
my $letters = ',A,C,G,N,T';
my @list1 = grep {/./} glob("{$letters}"x4);
print "L1: $_\n" for @list1;
# different ordering
my $letter2 = 'A,C,G,N,T';
my @list2 = (glob("{$letter2}"x2),glob("{$letter2}"x3),glob("{$letter2
+}"x4));
print "L2: $_\n" for @list2;
-Blake
| [reply] [d/l] [select] |
|
|