Working with collections of things is one of the most common chores in
programming. In this meditation, we'll create a functional
mini-language for sequences – more than iterators and yet not
quite streams – and use it as a simple, abstract tool for
working with collections. To gain familiarity with our tool, we'll
answer some recent questions that were asked by Seekers of Perl
Wisdom.
#### Fundamental sequences

*new* subroutine are standard
Perl OO fare. The *seqsub* function, however, is somewhat odd.
We'll use it as an alternative syntax to Perl's normal
*sub* when we want to to create functions (anonymous
subroutines) that represent sequences.
*while* loop within the function shows us the idiom for
iterating over a sequence's values: We extract values until we get an
empty array. We print each value we get. When we're done, we return
the sequence itself to facilitate function chaining.
#### Combinations of sequences

One of the things that makes sequences interesting is that we can
combine them in ways that let us eliminate a lot of boilerplate code.
For example, we often want to do things for each element of one set
and for each element of another. We typically use nested
*foreach* loops for this cause, but sequences give us another
option.
*n*-ary
function by "folding" it over a list of arguments. We can use
List::Util's *reduce* function for this purpose.
Let's use it generalize *seq_prod2* into the *n*-ary
*seq_prod*:
*foreach* loops (or use a module
such as Algorithm::Loops) to do what we just did in one line
of code. In the next section, we'll take this idea further to
see how much boilerplate code we can factor out.
#### Abstractions that replace nested *foreach* loops

It's fairly common in day-to-day programming that we must process
all of the combinations of elements that can be created by drawing one
element from each of several sets (arrays). In the 2-array case, for
example, we might need to test every element in one array with respect
to every element in another. In optimization or search problems, we
might need to examine each of the possibilities within an
*n*-dimensional search space to determine which have the
least cost or satisfy the given criteria.
*seq_from_spec*
to extract the digits of *n*-ary numbers:
*foreach* loop with the
following:
*foreach* loop, and it was. But our travels
covered more ground than might at first be obvious:
#### Transforming sequences: filters and maps

Sometimes the best approach to solving a problem is to solve a simpler
problem and then transform the simple solution into one that works for
the original problem. Let's say that we want to create a series
of odd integers. We might approach it like so:
*grep* for sequences. It takes a sequence and
a filtering function and returns a new sequence that passes through
the values of the original sequence for which the filtering function
returns true; all other values are filtered out. Using it, we can
construct our odd-integers solution:
#### SoPW Example: "Combinations of an array of arrays...?"

Take a moment to look at the problem posed in Combinations of an array of arrays...? and the
solutions offered by our good fellow monks. To summarize, the seeker
writes the following:
*seq_from_spec* and the resulting sequence will yield
all of the combinations.
#### More ways to combine sequences: series and parallel

In addition to combining sequences by way of Cartesian products,
we have other options. We can join them in series or merge them
in parallel. The following function performs the former:
*else* clause handles this case and resets any
partially-read sequences before returning. This ensures that our
sequences' next customers don't get short changed by inheriting
half-read sequences.
#### SoPW Example: "Comparing two arrays and counting pairwise comparisons"

Examine the thread Comparing two arrays and counting pairwise comparisons.
#### Extracting values from sequences

Sometimes sequences are merely a means to an end. Sometimes we don't
care about the sequences themselves but only the values they produce.
So it makes sense to extend our language with functions that extract
values from sequences.
#### So ends Part 1

In this first meditation on sequences, we began with a simple,
foundational definition and built upon it a small library of
functions. These tiny functions form the core of a mini-language that
moves us closer to our collections, where we can work more naturally.
Instead of thinking in terms of loops and iteration variables, we can
think in terms of products and zips and folds. Our language provides
direct support for combining, transforming, iterating over, and
extracting values from collections, which we represent as sequences.

Since we haven't defined what a sequence is yet, let's do that now:

A sequence is a function that represents a finite series of values. When called, the function returns the next value in the series. The first call to the function after the series has been exhausted returns an empty array and resets the sequence back to its beginning.

A couple of points. First, sequences are cyclical. After you exhaust them, they start over again. Second, the values extracted from a sequence are arrays, not scalars. This allows sequences to return multiple values at a time. As we'll see later, these properties allow us to combine sequences in interesting ways.

With our definition in mind, let's begin our implementation. Some preliminaries:

Although most of the code we'll write is functional in nature, Perl's object-oriented syntax is often a convenient way to call functions. To make this syntax available, let's set up a package and write a couple of helper functions to promote functions into Sequences:#!/usr/bin/perl use warnings; use strict;

The package declaration and thepackage Sequences; sub new { my ($proto, $seq) = @_; bless $seq, $proto; } sub seqsub(&) { Sequences->new(@_); }

To see how the set-up works, let's create our first sequence-making
function, *seq*. It creates a sequence out of the
series formed by its arguments:

sub seq { my ($i, $elems) = (0, \@_); seqsub { $i < @$elems ? ( $elems->[ $i++ ] ) : do { $i = 0; () }; } }

To see how it works, let's create simple, 3-element sequence and extract its values:

Because we will often want to see what's "in" a sequence, let's create a function to enumerate a sequence's values:my $abcees = seq("a", "b", "c"); $abcees->(); # ("a") $abcees->(); # ("b") $abcees->(); # ("c") $abcees->(); # ( ) # ... the cycle repeats ...

Theuse Data::Dumper; sub enumerate { local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; my ($i, $seq) = (0, $_[0]); while (my @val = $seq->()) { @val = map { ref ($_) ? Dumper($_) : $_ } @val; printf "%2d => %s\n", $i++, "@val"; } $seq; }

As an example, let's enumerate our earlier sequence:

Let's try the alternative OO syntax:enumerate( $abcees ); # 0 => a # 1 => b # 2 => c

Not bad. Now let's move on to something a little more weighty.$abcees->enumerate; # 0 => a # 1 => b # 2 => c

The following function takes two separate sequences *s* and
*t* and combines them into a sequence whose values are the
Cartesian product of the values drawn from *s* and *t*,
in effect nesting *t* inside of *s*:

An example:sub seq_prod2 { my ($s, $t) = @_; my @sval; seqsub { my @tval; while ( !@sval || !(@tval = $t->()) ) { return () unless @sval = $s->(); } ( @sval, @tval ); } };

One of the idioms that appears frequently in functional programming is to take a binary function and generalize it into anmy $one_two_threes = seq( 1 .. 3 ); enumerate( seq_prod2( $abcees, $one_two_threes ) ); # 0 => a 1 # 1 => a 2 # 2 => a 3 # 3 => b 1 # 4 => b 2 # 5 => b 3 # 6 => c 1 # 7 => c 2 # 8 => c 3

Now we can compute the products of any number of sequences:use List::Util qw( reduce ); sub seq_prod { reduce { seq_prod2($a,$b) } @_ ; }

The power of our abstraction is starting to become apparent. Normally we would need to nest threemy $you_and_mees = seq( "you", "me" ); seq_prod( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => a 1 me # 2 => a 2 you # 3 => a 2 me # 4 => a 3 you # 5 => a 3 me # 6 => b 1 you # 7 => b 1 me # 8 => b 2 you # 9 => b 2 me # 10 => b 3 you # 11 => b 3 me # 12 => c 1 you # 13 => c 1 me # 14 => c 2 you # 15 => c 2 me # 16 => c 3 you # 17 => c 3 me

One common way to approach such a problem is with nested loops. Consider the case involving three arrays:

Let's try to do the same thing with sequences.my (@alist, @blist, @clist); # ... initialize arrays with values ... foreach my $a (@alist) { foreach my $b (@blist) { foreach my $c (@clist) { # do something with ($a, $b, $c) ... } } }

First, we'll need to convert each array into a sequence. That's
easy; we can just use *seq*. Then, we must combine the
sequences; for this we'll use *seq_prod*, like before.
Finally, we'll extract the values from the combined sequence and
process them in turn. Using this recipe, we get the following:

That works, but it's clunky.my $combined_sequence = seq_prod( seq(@alist), seq(@blist), seq(@clist) ); while ( my ($a, $b, $c) = $combined_sequence->() ) { # do something with ($a, $b, $c) }

Let's see if we can refine the approach. As the first refinement, let's create helper functions to perform the first two steps, in effect creating a combined sequence from a specification of its subsequences:

Both helpers are tiny but handy and have many uses beyond the one we're aiming for now. For example, we can usesub seqs { map seq(@$_), @_; } sub seq_from_spec { seq_prod( seqs(@_) ); }

sub nary_digits { my ($base, $digits) = @_; seq_from_spec( ([0..$base-1]) x $digits ); } enumerate( nary_digits(2, 3) ); # 3-digit binary numbers # 0 => 0 0 0 # 1 => 0 0 1 # 2 => 0 1 0 # 3 => 0 1 1 # 4 => 1 0 0 # 5 => 1 0 1 # 6 => 1 1 0 # 7 => 1 1 1

On a naming note, I called the function *seq_from_spec*
instead of something like *seq_prod_from_spec* because I
consider the Cartesian product to be the most fundamental and useful
way of combining sets. So the idea of a "sequence specification"
that describes a product of sequences naturally follows:

# seq_from_spec([1..3]) === seq(1..3) # seq_from_spec([1..3],[4,5]) === seq(1..3) x seq(4,5) # seq_from_spec(\(@a,@b,...)) === seq(@a) x seq(@b) x ...

Back to our task, another helper factors out the looping boilerplate:

We can use it like so:sub seq_foreach { my ($seq, $fn) = @_; while (my @val = $seq->()) { $fn->(@val); } $seq; }

Finally, a higher-level helper ties the solution together:$abcees->seq_foreach( sub { print "@_\n"; } ); # a # b # c

Now we can replace our original, 3-sub seq_foreach_from_spec { my ($spec, $fn) = @_; seq_foreach( seq_from_spec( @$spec ), $fn ); }

In reflection, that may seem like a long way to have gone just to replace a 3-seq_foreach_from_spec( [\(@alist, @blist, @clist)], sub { my ($a, $b, $c) = @_; # do something with $a, $b, $c, ... });

- The exact same helper can be used to factor out nested loops
of
*any*fixed depth, not just 3. - The same helper can also handle cases when the depth is varying
and unknown in advance. In the same way that our
*nary_digits*function handles arbitrary digit lengths at run time, so does our helper function handle arbitrary numbers of input arrays. - We have captured useful idioms. Our helper functions add richness to our our programming vocabularies and let us think in higher-level terms such as Cartesian products instead of lower-level terms such as loops.
- We ain't done yet. In the same way that we combined tiny
functions to get us this far, we can combine
*these*functions with others to yield even more powerful tools.

- Generate a series of all integers. (Solve a simpler problem.)
- Filter the series so that only odd integers remain. (Transform the solution.)

The function issub seq_filter { my ($seq, $filter_fn) = @_; seqsub { my @val; 1 while @val = $seq->() and !$filter_fn->(@val); return @val; } }

Now let's say that we want to generate a similar sequence but for even integers. Again, we can use a transformational strategy. This time, however, we'll transform the odd-integer series into an even-integer series by subtracting one from each value. In effect, we're mapping one series to another. Our helper is named accordingly:sub odds_up_to { my $maximum = shift; seq( 1 .. $maximum ) ->seq_filter( sub { $_[0] % 2 } ) } enumerate( odds_up_to(10) ); # 0 => 1 # 1 => 3 # 2 => 5 # 3 => 7 # 4 => 9

And our even-integers solution:sub seq_map { my ($seq, $fn) = @_; seqsub { my @val = $seq->(); @val ? $fn->(@val) : (); } }

With these simple extensions to our vocabulary, we have greatly expanded the usefulness of our mini-language for sequences. Now let's try it out on a real-world problem.sub evens_up_to { odds_up_to( $_[0] + 1 ) ->seq_map( sub { $_[0] - 1 } ); } enumerate( evens_up_to(10) ); # 0 => 0 # 1 => 2 # 2 => 4 # 3 => 6 # 4 => 8 # 5 => 10

What I have is 7 arrays with arbitrary number of elements in each. I would like to create a new array that contains combinations of the other seven in this manner: (1) Only one element from each of the 7 arrays. (2) Minimum of 4 elements per element in final array.With our existing sequence language, the solution for part (1) of the seeker's request is straightforward: Just pass the 7 arrays to

Part (2) adds a wrinkle, however. It requires that the output
combinations each have at least 4 elements. This suggests that part
(1) really means, *zero or one* element from each array.
(Otherwise, the at-least-4 constraint is meaningless because all of
the combinations will have exactly 7 elements.)

To effect the zero-or-one behavior, we can transform each input
array like `[1,2,3]` into `[[],[1],[2],[3]]`.
Each element in the transformed array is a zero- or one-element
array. (Note the insertion of an an "empty" element at the head of the
array.) Next, we can pass these transformed arrays to
*seq_from_spec*, as usual. On the backside, we can map the
combinations back into the desired form by merging the zero- or
one-element arrays. This we can do with a map.

Finally, we'll filter the combinations so that only those of the desired minimum length are kept.

Putting it all together in the form of a generalized solution:

(BTW, this code is the Perl equivalent of the Haskell solution that I posted in the thread.)sub min_length_combinations { my ($min_length, @inputs) = @_; my @input_spec = map [ [], (map [$_], @$_) ], @inputs; seq_from_spec( @input_spec ) ->seq_map( sub { [ map @$_, @_ ] } ) ->seq_filter( sub { @{$_[0]} >= $min_length } ) }

Now, we can solve the example problem from the seeker's original post:

min_length_combinations( 4, map [split//], qw( abc de fgh i jk l m ) )->enumerate; # 0 => ['i','j','l','m'] # 1 => ['i','k','l','m'] # 2 => ['f','j','l','m'] # 3 => ['f','k','l','m'] # # ... # # 862 => ['c','e','h','i','k'] # 863 => ['c','e','h','i','k','m'] # 864 => ['c','e','h','i','k','l'] # 865 => ['c','e','h','i','k','l','m']

Merging sequences in parallel is like joining the teeth of a coat's zipper. As input we have two (or more) separate sequences and as output we have one zipped-together sequence. In keeping with Haskell tradition, our zipper function will let us merge sequences of unequal lengths. In that case, the zipped sequence will be as long as the shortest input sequence. Here's the code:sub seq_series { my $seqs = seq( @_ ); # seq of seqs (!) my $seq; seqsub { my @val; do { ($seq) = $seqs->() unless $seq; @val = $seq->() if $seq; } while !@val && ($seq = $seqs->()); @val; } } seq_series( $abcees, $one_two_threes )->enumerate; # 0 => a # 1 => b # 2 => c # 3 => 1 # 4 => 2 # 5 => 3

The need to handle unequal lengths adds complexity to our code. In particular, thesub seq_reset { my $seq = shift; if ($seq) { 1 while $seq->(); } $seq; } sub seq_zip { my $seqs = seq( @_ ); # seq of seqs (!) my $seq_count = @_; seqsub { my @outvals; while (my $seq = $seqs->()) { if (my @val = $seq->()) { push @outvals, @val; } else { seq_reset( $seqs->() ) for 1 .. $seq_count; seq_reset( $seqs ); return (); } } return @outvals; } }

Some examples:

To generalize our zipping options, we can zip sequences with a given "zipper" function:seq_zip( $abcees, $one_two_threes )->enumerate; # 0 => a 1 # 1 => b 2 # 2 => c 3 seq_zip( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => b 2 me

Some examples:sub seq_zip_with { my $zipper_fn = shift; seq_map( seq_zip(@_), $zipper_fn ); }

With these new extensions to our mini-language for sequences, let's tackle another real Perl Monks problem.# some math helpers sub sum { reduce { $a + $b } @_ } sub product { reduce { $a * $b } @_ } seq_zip_with( \&sum, seq(1..3), seq(0..10) )->enumerate; # 0 => 1 # 1 => 3 # 2 => 5 seq_zip_with( \&product, seq(1..5), seq(0..10), seq(2..8) ) ->enumerate; # 0 => 0 # 1 => 6 # 2 => 24 # 3 => 60 # 4 => 120

The seeker has two arrays of strings and wants to operate on each combination of elements from the two. The operation to be performed is a pairwise "comparison" which amounts to counting the character pairs that are formed when each element is zipped with the other.

Since combinations and zips are part of our sequence mini-language, our implementation is straightforward:

my @site1 = qw( AATKKM aatkkm ); my @site2 = qw( GGGGGG gggggg ); my %counts; seq_foreach_from_spec( [ \(@site1, @site2) ], sub { seq_foreach( seq_zip( ( map seq(split//), @_ ) ), sub { $counts{"@_"}++ } ) } ); print Dumper(\%counts), "\n"; # { 'K G' => 2, 'A G' => 2, 'm g' => 1, 'a g' => 2, # 'A g' => 2, 'M G' => 1, 'k g' => 2, 'k G' => 2, # 'T G' => 1, 'a G' => 2, 'm G' => 1, 't G' => 1, # 'K g' => 2, 'M g' => 1, 't g' => 1, 'T g' => 1 }

We'll create two value extractors. The first is a general-purpose extractor that will capture each array value in a sequence as an arrayref. The second is for when we expect the sequence's values to be scalars. In this case, we don't need to wrap each value within an array and can just return it straight.

Some examples:sub seq_values { my $seq = shift; seq_values_scalar( seq_map( $seq, sub { [@_] } ) ); } sub seq_values_scalar { my $seq = shift; my @values; seq_foreach( $seq, sub { push @values, @_ } ); return @values; }

As a more realistic example, let's write a function to transpose a matrix, that is, exchange its rows and columns:print Dumper( [ seq(1..3)->seq_values ] ), "\n"; # [[1],[2],[3]] print Dumper( [ seq(1..3)->seq_values_scalar ] ), "\n"; # [1,2,3]

sub matrix_transpose { my $rows = shift; [ seq_values( seq_zip( seqs(@$rows) ) ) ]; } my $matrix = [ [ 0, 1 ] , [ 2, 3 ] , [ 4, 5 ] ]; print Dumper( matrix_transpose( $matrix ) ), "\n"; # [ [0,2,4] # , [1,3,5] ]

Another way to extract values is to collapse a sequence into a
single value by "folding" an accumulating function across the
sequence's values. This is similar to what List::Util's
*reduce* function does for arrays. Here's our implementation
for sequences:

Continuing with our matrix theme, let's write a function that computes the dot product of two vectors. We multiply the vectors' elements pairwise, and then add the resulting products:sub seq_fold { my ($seq, $fn) = @_; my @accum = $seq->(); while (@accum && (my @val = $seq->())) { @accum = $fn->(@accum, @val); } wantarray ? @accum : $accum[0]; }

sub dot_product { seq_zip_with( \&product, seqs(@_) ) ->seq_fold( \&sum ); } print dot_product( [1,1,1], [1,2,3] ), "\n"; # 6

So far, the ways in which we have combined sequences have been
static. In the next part, we'll look at how we can place tiny
functions *in between* the parts of our sequences to introduce
dynamism. This simple extension brings many complex manipulations
within reach. We'll look at some interesting applications and solve a
few more Perl Monks problems. We might even referee a
game of hyperdimensional tic-tac-toe.

Thanks for taking the time to read this meditation. If you have any criticisms or can think of any way to make my writing better, please let me know.

Cheers,

Tom

**Tom Moertel** : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder

Back to
Meditations

Comment onA mini-language for sequences (part 1)SelectorDownloadCode