perlmeditation
tmoertel
Working with collections of things is one of the most common chores in
programming. In this meditation, we'll create a functional
mini-language for sequences – more than iterators and yet not
quite streams – and use it as a simple, abstract tool for
working with collections. To gain familiarity with our tool, we'll
answer some recent questions that were asked by Seekers of Perl
Wisdom.
<readmore>
<p>Since we haven't defined what a sequence is yet, let's do that now:
<blockquote>
A sequence is a function that represents a finite series of values.
When called, the function returns the next value in the series. The
first call to the function after the series has been exhausted returns
an empty array and resets the sequence back to its beginning.
</blockquote>
<p>A couple of points. First, sequences are cyclical. After you
exhaust them, they start over again. Second, the values extracted
from a sequence are arrays, not scalars. This allows sequences to
return multiple values at a time. As we'll see later, these
properties allow us to combine sequences in interesting ways.
<h4>Fundamental sequences</h4>
<p>With our definition in mind, let's begin our implementation. Some
preliminaries:
<code>
#!/usr/bin/perl
use warnings;
use strict;
</code>
Although most of the code we'll write is functional in nature, Perl's
object-oriented syntax is often a convenient way to call functions.
To make this syntax available, let's set up a package and write a
couple of helper functions to promote functions into Sequences:
<code>
package Sequences;
sub new {
my ($proto, $seq) = @_;
bless $seq, $proto;
}
sub seqsub(&) {
Sequences->new(@_);
}
</code>
The package declaration and the <em>new</em> subroutine are standard
Perl OO fare. The <em>seqsub</em> function, however, is somewhat odd.
We'll use it as an alternative syntax to Perl's normal
<em>sub</em> when we want to to create functions (anonymous
subroutines) that represent sequences.
<p>To see how the set-up works, let's create our first sequence-making
function, <em>seq</em>. It creates a sequence out of the
series formed by its arguments:
<code>
sub seq {
my ($i, $elems) = (0, \@_);
seqsub {
$i < @$elems
? ( $elems->[ $i++ ] )
: do { $i = 0; () };
}
}
</code>
<p>To see how it works, let's create simple, 3-element sequence
and extract its values:
<code>
my $abcees = seq("a", "b", "c");
$abcees->(); # ("a")
$abcees->(); # ("b")
$abcees->(); # ("c")
$abcees->(); # ( )
# ... the cycle repeats ...
</code>
Because we will often want to see what's "in" a sequence, let's
create a function to enumerate a sequence's values:
<code>
use Data::Dumper;
sub enumerate {
local $Data::Dumper::Terse = 1;
local $Data::Dumper::Indent = 0;
my ($i, $seq) = (0, $_[0]);
while (my @val = $seq->()) {
@val = map { ref ($_) ? Dumper($_) : $_ } @val;
printf "%2d => %s\n", $i++, "@val";
}
$seq;
}
</code>
The <em>while</em> loop within the function shows us the idiom for
iterating over a sequence's values: We extract values until we get an
empty array. We print each value we get. When we're done, we return
the sequence itself to facilitate function chaining.
<p>As an example, let's enumerate our earlier sequence:
<code>
enumerate( $abcees );
# 0 => a
# 1 => b
# 2 => c
</code>
Let's try the alternative OO syntax:
<code>
$abcees->enumerate;
# 0 => a
# 1 => b
# 2 => c
</code>
Not bad. Now let's move on to something a little more weighty.
<h4>Combinations of sequences</h4>
One of the things that makes sequences interesting is that we can
combine them in ways that let us eliminate a lot of boilerplate code.
For example, we often want to do things for each element of one set
and for each element of another. We typically use nested
<em>foreach</em> loops for this cause, but sequences give us another
option.
<p>The following function takes two separate sequences <em>s</em> and
<em>t</em> and combines them into a sequence whose values are the
Cartesian product of the values drawn from <em>s</em> and <em>t</em>,
in effect nesting <em>t</em> inside of <em>s</em>:
<code>
sub seq_prod2 {
my ($s, $t) = @_;
my @sval;
seqsub {
my @tval;
while ( !@sval || !(@tval = $t->()) ) {
return () unless @sval = $s->();
}
( @sval, @tval );
}
};
</code>
An example:
<code>
my $one_two_threes = seq( 1 .. 3 );
enumerate( seq_prod2( $abcees, $one_two_threes ) );
# 0 => a 1
# 1 => a 2
# 2 => a 3
# 3 => b 1
# 4 => b 2
# 5 => b 3
# 6 => c 1
# 7 => c 2
# 8 => c 3
</code>
One of the idioms that appears frequently in functional programming is
to take a binary function and generalize it into an <em>n</em>-ary
function by "folding" it over a list of arguments. We can use
[cpan://List::Util]'s <em>reduce</em> function for this purpose.
Let's use it generalize <em>seq_prod2</em> into the <em>n</em>-ary
<em>seq_prod</em>:
<code>
use List::Util qw( reduce );
sub seq_prod {
reduce { seq_prod2($a,$b) } @_ ;
}
</code>
Now we can compute the products of any number of sequences:
<code>
my $you_and_mees = seq( "you", "me" );
seq_prod( $abcees, $one_two_threes, $you_and_mees )
->enumerate;
# 0 => a 1 you
# 1 => a 1 me
# 2 => a 2 you
# 3 => a 2 me
# 4 => a 3 you
# 5 => a 3 me
# 6 => b 1 you
# 7 => b 1 me
# 8 => b 2 you
# 9 => b 2 me
# 10 => b 3 you
# 11 => b 3 me
# 12 => c 1 you
# 13 => c 1 me
# 14 => c 2 you
# 15 => c 2 me
# 16 => c 3 you
# 17 => c 3 me
</code>
The power of our abstraction is starting to become apparent. Normally
we would need to nest three <em>foreach</em> loops (or use a module
such as [cpan://Algorithm::Loops]) to do what we just did in one line
of code. In the next section, we'll take this idea further to
see how much boilerplate code we can factor out.
<h4>Abstractions that replace nested <em>foreach</em> loops</h4>
It's fairly common in day-to-day programming that we must process
all of the combinations of elements that can be created by drawing one
element from each of several sets (arrays). In the 2-array case, for
example, we might need to test every element in one array with respect
to every element in another. In optimization or search problems, we
might need to examine each of the possibilities within an
<em>n</em>-dimensional search space to determine which have the
least cost or satisfy the given criteria.
<p>One common way to approach such a problem is with nested loops.
Consider the case involving three arrays:
<code>
my (@alist, @blist, @clist);
# ... initialize arrays with values ...
foreach my $a (@alist) {
foreach my $b (@blist) {
foreach my $c (@clist) {
# do something with ($a, $b, $c) ...
}
}
}
</code>
Let's try to do the same thing with sequences.
<p>First, we'll need to convert each array into a sequence. That's
easy; we can just use <em>seq</em>. Then, we must combine the
sequences; for this we'll use <em>seq_prod</em>, like before.
Finally, we'll extract the values from the combined sequence and
process them in turn. Using this recipe, we get the following:
<code>
my $combined_sequence =
seq_prod( seq(@alist), seq(@blist), seq(@clist) );
while ( my ($a, $b, $c) = $combined_sequence->() ) {
# do something with ($a, $b, $c)
}
</code>
That works, but it's clunky.
<p>Let's see if we can refine the approach. As the first refinement,
let's create helper functions to perform the first two steps, in
effect creating a combined sequence from a specification of its
subsequences:
<code>
sub seqs {
map seq(@$_), @_;
}
sub seq_from_spec {
seq_prod( seqs(@_) );
}
</code>
Both helpers are tiny but handy and have many uses beyond the one
we're aiming for now. For example, we can use <em>seq_from_spec</em>
to extract the digits of <em>n</em>-ary numbers:
<code>
sub nary_digits {
my ($base, $digits) = @_;
seq_from_spec( ([0..$base-1]) x $digits );
}
enumerate( nary_digits(2, 3) ); # 3-digit binary numbers
# 0 => 0 0 0
# 1 => 0 0 1
# 2 => 0 1 0
# 3 => 0 1 1
# 4 => 1 0 0
# 5 => 1 0 1
# 6 => 1 1 0
# 7 => 1 1 1
</code>
<p>On a naming note, I called the function <em>seq_from_spec</em>
instead of something like <em>seq_prod_from_spec</em> because I
consider the Cartesian product to be the most fundamental and useful
way of combining sets. So the idea of a "sequence specification"
that describes a product of sequences naturally follows:
<code>
# seq_from_spec([1..3]) === seq(1..3)
# seq_from_spec([1..3],[4,5]) === seq(1..3) x seq(4,5)
# seq_from_spec(\(@a,@b,...)) === seq(@a) x seq(@b) x ...
</code>
<p>Back to our task, another helper factors out the looping boilerplate:
<code>
sub seq_foreach {
my ($seq, $fn) = @_;
while (my @val = $seq->()) {
$fn->(@val);
}
$seq;
}
</code>
We can use it like so:
<code>
$abcees->seq_foreach( sub { print "@_\n"; } );
# a
# b
# c
</code>
Finally, a higher-level helper ties the solution together:
<code>
sub seq_foreach_from_spec {
my ($spec, $fn) = @_;
seq_foreach( seq_from_spec( @$spec ), $fn );
}
</code>
Now we can replace our original, 3-<em>foreach</em> loop with the
following:
<code>
seq_foreach_from_spec( [\(@alist, @blist, @clist)], sub {
my ($a, $b, $c) = @_;
# do something with $a, $b, $c, ...
});
</code>
In reflection, that may seem like a long way to have gone just to
replace a 3-<em>foreach</em> loop, and it was. But our travels
covered more ground than might at first be obvious:
<ol>
<li>The exact same helper can be used to factor out nested loops
of <em>any</em> fixed depth, not just 3.
<li>The same helper can also handle cases when the depth is varying
and unknown in advance. In the same way that our
<em>nary_digits</em> function handles arbitrary digit lengths at
run time, so does our helper function handle arbitrary
numbers of input arrays.
<li>We have captured useful idioms. Our helper functions add richness
to our our programming vocabularies and let us think in
higher-level terms such as Cartesian products instead of
lower-level terms such as loops.
<li>We ain't done yet. In the same way that we combined tiny
functions to get us this far, we can combine <em>these</em>
functions with others to yield even more powerful tools.
</ol>
<h4>Transforming sequences: filters and maps</h4>
Sometimes the best approach to solving a problem is to solve a simpler
problem and then transform the simple solution into one that works for
the original problem. Let's say that we want to create a series
of odd integers. We might approach it like so:
<ol>
<li>Generate a series of all integers. (Solve a simpler problem.)
<li>Filter the series so that only odd integers remain. (Transform
the solution.)
</ol>
To implement this strategy, let's first create a helper function that
applies a filtering transformation to a sequence to yield a new,
filtered sequence:
<code>
sub seq_filter {
my ($seq, $filter_fn) = @_;
seqsub {
my @val;
1 while @val = $seq->() and !$filter_fn->(@val);
return @val;
}
}
</code>
The function is <em>grep</em> for sequences. It takes a sequence and
a filtering function and returns a new sequence that passes through
the values of the original sequence for which the filtering function
returns true; all other values are filtered out. Using it, we can
construct our odd-integers solution:
<code>
sub odds_up_to {
my $maximum = shift;
seq( 1 .. $maximum )
->seq_filter( sub { $_[0] % 2 } )
}
enumerate( odds_up_to(10) );
# 0 => 1
# 1 => 3
# 2 => 5
# 3 => 7
# 4 => 9
</code>
Now let's say that we want to generate a similar sequence but for
even integers. Again, we can use a transformational strategy.
This time, however, we'll transform the odd-integer series into
an even-integer series by subtracting one from each value. In
effect, we're mapping one series to another. Our helper
is named accordingly:
<code>
sub seq_map {
my ($seq, $fn) = @_;
seqsub {
my @val = $seq->();
@val ? $fn->(@val) : ();
}
}
</code>
And our even-integers solution:
<code>
sub evens_up_to {
odds_up_to( $_[0] + 1 )
->seq_map( sub { $_[0] - 1 } );
}
enumerate( evens_up_to(10) );
# 0 => 0
# 1 => 2
# 2 => 4
# 3 => 6
# 4 => 8
# 5 => 10
</code>
With these simple extensions to our vocabulary, we have greatly
expanded the usefulness of our mini-language for sequences.
Now let's try it out on a real-world problem.
<h4>SoPW Example: "Combinations of an array of arrays...?"</h4>
Take a moment to look at the problem posed in [id://397671] and the
solutions offered by our good fellow monks. To summarize, the seeker
writes the following:
<blockquote>
What I have is 7 arrays with arbitrary number of elements in each. I
would like to create a new array that contains combinations of the
other seven in this manner: (1) Only one element from each of the 7
arrays. (2) Minimum of 4 elements per element in final array.
</blockquote>
With our existing sequence language, the solution for part (1) of
the seeker's request is straightforward: Just pass the 7 arrays
to <em>seq_from_spec</em> and the resulting sequence will yield
all of the combinations.
<p>Part (2) adds a wrinkle, however. It requires that the output
combinations each have at least 4 elements. This suggests that part
(1) really means, <em>zero or one</em> element from each array.
(Otherwise, the at-least-4 constraint is meaningless because all of
the combinations will have exactly 7 elements.)
<p>To effect the zero-or-one behavior, we can transform each input
array like <code>[1,2,3]</code> into <code>[[],[1],[2],[3]]</code>.
Each element in the transformed array is a zero- or one-element
array. (Note the insertion of an an "empty" element at the head of the
array.) Next, we can pass these transformed arrays to
<em>seq_from_spec</em>, as usual. On the backside, we can map the
combinations back into the desired form by merging the zero- or
one-element arrays. This we can do with a map.
<p>Finally, we'll filter the combinations so that only those
of the desired minimum length are kept.
<p>Putting it all together in the form of a generalized solution:
<code>
sub min_length_combinations {
my ($min_length, @inputs) = @_;
my @input_spec = map [ [], (map [$_], @$_) ], @inputs;
seq_from_spec( @input_spec )
->seq_map( sub { [ map @$_, @_ ] } )
->seq_filter( sub { @{$_[0]} >= $min_length } )
}
</code>
(BTW, this code is the Perl equivalent of the Haskell solution that I
posted in the thread.)
<p>Now, we can solve the example problem from the seeker's original post:
<code>
min_length_combinations(
4, map [split//], qw( abc de fgh i jk l m )
)->enumerate;
# 0 => ['i','j','l','m']
# 1 => ['i','k','l','m']
# 2 => ['f','j','l','m']
# 3 => ['f','k','l','m']
#
# ...
#
# 862 => ['c','e','h','i','k']
# 863 => ['c','e','h','i','k','m']
# 864 => ['c','e','h','i','k','l']
# 865 => ['c','e','h','i','k','l','m']
</code>
<h4>More ways to combine sequences: series and parallel</h4>
In addition to combining sequences by way of Cartesian products,
we have other options. We can join them in series or merge them
in parallel. The following function performs the former:
<code>
sub seq_series {
my $seqs = seq( @_ ); # seq of seqs (!)
my $seq;
seqsub {
my @val;
do {
($seq) = $seqs->() unless $seq;
@val = $seq->() if $seq;
} while !@val && ($seq = $seqs->());
@val;
}
}
seq_series( $abcees, $one_two_threes )->enumerate;
# 0 => a
# 1 => b
# 2 => c
# 3 => 1
# 4 => 2
# 5 => 3
</code>
Merging sequences in parallel is like joining the teeth of a coat's
zipper. As input we have two (or more) separate sequences and as
output we have one zipped-together sequence. In keeping with Haskell
tradition, our zipper function will let us merge sequences of unequal
lengths. In that case, the zipped sequence will be as long as the
shortest input sequence. Here's the code:
<code>
sub seq_reset {
my $seq = shift;
if ($seq) {
1 while $seq->();
}
$seq;
}
sub seq_zip {
my $seqs = seq( @_ ); # seq of seqs (!)
my $seq_count = @_;
seqsub {
my @outvals;
while (my $seq = $seqs->()) {
if (my @val = $seq->()) {
push @outvals, @val;
}
else {
seq_reset( $seqs->() ) for 1 .. $seq_count;
seq_reset( $seqs );
return ();
}
}
return @outvals;
}
}
</code>
The need to handle unequal lengths adds complexity to our code. In
particular, the <em>else</em> clause handles this case and resets any
partially-read sequences before returning. This ensures that our
sequences' next customers don't get short changed by inheriting
half-read sequences.
<p>Some examples:
<code>
seq_zip( $abcees, $one_two_threes )->enumerate;
# 0 => a 1
# 1 => b 2
# 2 => c 3
seq_zip( $abcees, $one_two_threes, $you_and_mees )
->enumerate;
# 0 => a 1 you
# 1 => b 2 me
</code>
To generalize our zipping options, we can zip sequences with
a given "zipper" function:
<code>
sub seq_zip_with {
my $zipper_fn = shift;
seq_map( seq_zip(@_), $zipper_fn );
}
</code>
Some examples:
<code>
# some math helpers
sub sum { reduce { $a + $b } @_ }
sub product { reduce { $a * $b } @_ }
seq_zip_with( \&sum, seq(1..3), seq(0..10) )->enumerate;
# 0 => 1
# 1 => 3
# 2 => 5
seq_zip_with( \&product, seq(1..5), seq(0..10), seq(2..8) )
->enumerate;
# 0 => 0
# 1 => 6
# 2 => 24
# 3 => 60
# 4 => 120
</code>
With these new extensions to our mini-language for sequences,
let's tackle another real Perl Monks problem.
<h4>SoPW Example: "Comparing two arrays and counting pairwise comparisons"</h4>
Examine the thread [id://400340].
<p>The seeker has two arrays of strings and wants to operate on each
combination of elements from the two. The operation to be performed
is a pairwise "comparison" which amounts to counting the character
pairs that are formed when each element is zipped with the other.
<p>Since combinations and zips are part of our sequence mini-language,
our implementation is straightforward:
<code>
my @site1 = qw( AATKKM aatkkm );
my @site2 = qw( GGGGGG gggggg );
my %counts;
seq_foreach_from_spec( [ \(@site1, @site2) ],
sub {
seq_foreach(
seq_zip( ( map seq(split//), @_ ) ),
sub { $counts{"@_"}++ }
)
}
);
print Dumper(\%counts), "\n";
# { 'K G' => 2, 'A G' => 2, 'm g' => 1, 'a g' => 2,
# 'A g' => 2, 'M G' => 1, 'k g' => 2, 'k G' => 2,
# 'T G' => 1, 'a G' => 2, 'm G' => 1, 't G' => 1,
# 'K g' => 2, 'M g' => 1, 't g' => 1, 'T g' => 1 }
</code>
<h4>Extracting values from sequences</h4>
Sometimes sequences are merely a means to an end. Sometimes we don't
care about the sequences themselves but only the values they produce.
So it makes sense to extend our language with functions that extract
values from sequences.
<p>We'll create two value extractors. The first is a general-purpose
extractor that will capture each array value in a sequence as an
arrayref. The second is for when we expect the sequence's values to
be scalars. In this case, we don't need to wrap each value within
an array and can just return it straight.
<code>
sub seq_values {
my $seq = shift;
seq_values_scalar( seq_map( $seq, sub { [@_] } ) );
}
sub seq_values_scalar {
my $seq = shift;
my @values;
seq_foreach( $seq, sub { push @values, @_ } );
return @values;
}
</code>
Some examples:
<code>
print Dumper( [ seq(1..3)->seq_values ] ), "\n";
# [[1],[2],[3]]
print Dumper( [ seq(1..3)->seq_values_scalar ] ), "\n";
# [1,2,3]
</code>
As a more realistic example, let's write a function to <a
href="http://mathworld.wolfram.com/Transpose.html">transpose</a> a
matrix, that is, exchange its rows and columns:
<code>
sub matrix_transpose {
my $rows = shift;
[ seq_values( seq_zip( seqs(@$rows) ) ) ];
}
my $matrix = [ [ 0, 1 ]
, [ 2, 3 ]
, [ 4, 5 ] ];
print Dumper( matrix_transpose( $matrix ) ), "\n";
# [ [0,2,4]
# , [1,3,5] ]
</code>
<p>Another way to extract values is to collapse a sequence into a
single value by "folding" an accumulating function across the
sequence's values. This is similar to what [cpan://List::Util]'s
<em>reduce</em> function does for arrays. Here's our implementation
for sequences:
<code>
sub seq_fold {
my ($seq, $fn) = @_;
my @accum = $seq->();
while (@accum && (my @val = $seq->())) {
@accum = $fn->(@accum, @val);
}
wantarray ? @accum : $accum[0];
}
</code>
Continuing with our matrix theme, let's write a function that computes
the <a href="http://mathworld.wolfram.com/DotProduct.html">dot
product</a> of two vectors. We multiply the vectors' elements
pairwise, and then add the resulting products:
<code>
sub dot_product {
seq_zip_with( \&product, seqs(@_) )
->seq_fold( \&sum );
}
print dot_product( [1,1,1], [1,2,3] ), "\n";
# 6
</code>
<h4>So ends Part 1</h4>
In this first meditation on sequences, we began with a simple,
foundational definition and built upon it a small library of
functions. These tiny functions form the core of a mini-language that
moves us closer to our collections, where we can work more naturally.
Instead of thinking in terms of loops and iteration variables, we can
think in terms of products and zips and folds. Our language provides
direct support for combining, transforming, iterating over, and
extracting values from collections, which we represent as sequences.
<p>So far, the ways in which we have combined sequences have been
static. In the next part, we'll look at how we can place tiny
functions <em>in between</em> the parts of our sequences to introduce
dynamism. This simple extension brings many complex manipulations
within reach. We'll look at some interesting applications and solve a
few more Perl Monks problems. We might even referee a
game of hyperdimensional tic-tac-toe.
<p>Thanks for taking the time to read this meditation. If you
have any criticisms or can think of any way to make my writing
better, please let me know.
<p>Cheers,<br>Tom
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-109406">
<p><small><strong>Tom Moertel</strong> : <a href="http://community.moertel.com/">Blog</a> / <a href="http://community.moertel.com/ss/space/Talks">Talks</a> / <a href="http://search.cpan.org/~tmoertel/">CPAN</a> / <a href="http://community.moertel.com/LectroTest">LectroTest</a> / <a href="http://community.moertel.com/ss/space/PXSL">PXSL</a> / <a href="http://community.moertel.com/ss/space/A+Coder%27s+Guide+To+Coffee">Coffee</a> / <a href="http://community.moertel.com/ss/space/IMDB+Movie-Rating+Decoder+Ring">Movie Rating Decoder</a></small>
</div></div>
</readmore>