|Problems? Is your data what you think it is?|
A mini-language for sequences (part 1)by tmoertel (Chaplain)
|on Nov 05, 2004 at 13:15 UTC||Need Help??|
Working with collections of things is one of the most common chores in programming. In this meditation, we'll create a functional mini-language for sequences – more than iterators and yet not quite streams – and use it as a simple, abstract tool for working with collections. To gain familiarity with our tool, we'll answer some recent questions that were asked by Seekers of Perl Wisdom.
Since we haven't defined what a sequence is yet, let's do that now:
A sequence is a function that represents a finite series of values. When called, the function returns the next value in the series. The first call to the function after the series has been exhausted returns an empty array and resets the sequence back to its beginning.
A couple of points. First, sequences are cyclical. After you exhaust them, they start over again. Second, the values extracted from a sequence are arrays, not scalars. This allows sequences to return multiple values at a time. As we'll see later, these properties allow us to combine sequences in interesting ways.
With our definition in mind, let's begin our implementation. Some preliminaries:
Although most of the code we'll write is functional in nature, Perl's object-oriented syntax is often a convenient way to call functions. To make this syntax available, let's set up a package and write a couple of helper functions to promote functions into Sequences:
The package declaration and the new subroutine are standard Perl OO fare. The seqsub function, however, is somewhat odd. We'll use it as an alternative syntax to Perl's normal sub when we want to to create functions (anonymous subroutines) that represent sequences.
To see how the set-up works, let's create our first sequence-making function, seq. It creates a sequence out of the series formed by its arguments:
To see how it works, let's create simple, 3-element sequence and extract its values:
Because we will often want to see what's "in" a sequence, let's create a function to enumerate a sequence's values:
The while loop within the function shows us the idiom for iterating over a sequence's values: We extract values until we get an empty array. We print each value we get. When we're done, we return the sequence itself to facilitate function chaining.
As an example, let's enumerate our earlier sequence:
Let's try the alternative OO syntax:
Not bad. Now let's move on to something a little more weighty.
Combinations of sequencesOne of the things that makes sequences interesting is that we can combine them in ways that let us eliminate a lot of boilerplate code. For example, we often want to do things for each element of one set and for each element of another. We typically use nested foreach loops for this cause, but sequences give us another option.
The following function takes two separate sequences s and t and combines them into a sequence whose values are the Cartesian product of the values drawn from s and t, in effect nesting t inside of s:
One of the idioms that appears frequently in functional programming is to take a binary function and generalize it into an n-ary function by "folding" it over a list of arguments. We can use List::Util's reduce function for this purpose. Let's use it generalize seq_prod2 into the n-ary seq_prod:
Now we can compute the products of any number of sequences:
The power of our abstraction is starting to become apparent. Normally we would need to nest three foreach loops (or use a module such as Algorithm::Loops) to do what we just did in one line of code. In the next section, we'll take this idea further to see how much boilerplate code we can factor out.
Abstractions that replace nested foreach loopsIt's fairly common in day-to-day programming that we must process all of the combinations of elements that can be created by drawing one element from each of several sets (arrays). In the 2-array case, for example, we might need to test every element in one array with respect to every element in another. In optimization or search problems, we might need to examine each of the possibilities within an n-dimensional search space to determine which have the least cost or satisfy the given criteria.
One common way to approach such a problem is with nested loops. Consider the case involving three arrays:
Let's try to do the same thing with sequences.
First, we'll need to convert each array into a sequence. That's easy; we can just use seq. Then, we must combine the sequences; for this we'll use seq_prod, like before. Finally, we'll extract the values from the combined sequence and process them in turn. Using this recipe, we get the following:
That works, but it's clunky.
Let's see if we can refine the approach. As the first refinement, let's create helper functions to perform the first two steps, in effect creating a combined sequence from a specification of its subsequences:
Both helpers are tiny but handy and have many uses beyond the one we're aiming for now. For example, we can use seq_from_spec to extract the digits of n-ary numbers:
On a naming note, I called the function seq_from_spec instead of something like seq_prod_from_spec because I consider the Cartesian product to be the most fundamental and useful way of combining sets. So the idea of a "sequence specification" that describes a product of sequences naturally follows:
Back to our task, another helper factors out the looping boilerplate:
We can use it like so:
Finally, a higher-level helper ties the solution together:
Now we can replace our original, 3-foreach loop with the following:
In reflection, that may seem like a long way to have gone just to replace a 3-foreach loop, and it was. But our travels covered more ground than might at first be obvious:
Transforming sequences: filters and mapsSometimes the best approach to solving a problem is to solve a simpler problem and then transform the simple solution into one that works for the original problem. Let's say that we want to create a series of odd integers. We might approach it like so:
The function is grep for sequences. It takes a sequence and a filtering function and returns a new sequence that passes through the values of the original sequence for which the filtering function returns true; all other values are filtered out. Using it, we can construct our odd-integers solution:
Now let's say that we want to generate a similar sequence but for even integers. Again, we can use a transformational strategy. This time, however, we'll transform the odd-integer series into an even-integer series by subtracting one from each value. In effect, we're mapping one series to another. Our helper is named accordingly:
And our even-integers solution:
With these simple extensions to our vocabulary, we have greatly expanded the usefulness of our mini-language for sequences. Now let's try it out on a real-world problem.
SoPW Example: "Combinations of an array of arrays...?"Take a moment to look at the problem posed in Combinations of an array of arrays...? and the solutions offered by our good fellow monks. To summarize, the seeker writes the following:
What I have is 7 arrays with arbitrary number of elements in each. I would like to create a new array that contains combinations of the other seven in this manner: (1) Only one element from each of the 7 arrays. (2) Minimum of 4 elements per element in final array.With our existing sequence language, the solution for part (1) of the seeker's request is straightforward: Just pass the 7 arrays to seq_from_spec and the resulting sequence will yield all of the combinations.
Part (2) adds a wrinkle, however. It requires that the output combinations each have at least 4 elements. This suggests that part (1) really means, zero or one element from each array. (Otherwise, the at-least-4 constraint is meaningless because all of the combinations will have exactly 7 elements.)
To effect the zero-or-one behavior, we can transform each input array like [1,2,3] into [,,,]. Each element in the transformed array is a zero- or one-element array. (Note the insertion of an an "empty" element at the head of the array.) Next, we can pass these transformed arrays to seq_from_spec, as usual. On the backside, we can map the combinations back into the desired form by merging the zero- or one-element arrays. This we can do with a map.
Finally, we'll filter the combinations so that only those of the desired minimum length are kept.
Putting it all together in the form of a generalized solution:
(BTW, this code is the Perl equivalent of the Haskell solution that I posted in the thread.)
Now, we can solve the example problem from the seeker's original post:
More ways to combine sequences: series and parallelIn addition to combining sequences by way of Cartesian products, we have other options. We can join them in series or merge them in parallel. The following function performs the former:
Merging sequences in parallel is like joining the teeth of a coat's zipper. As input we have two (or more) separate sequences and as output we have one zipped-together sequence. In keeping with Haskell tradition, our zipper function will let us merge sequences of unequal lengths. In that case, the zipped sequence will be as long as the shortest input sequence. Here's the code:
The need to handle unequal lengths adds complexity to our code. In particular, the else clause handles this case and resets any partially-read sequences before returning. This ensures that our sequences' next customers don't get short changed by inheriting half-read sequences.
To generalize our zipping options, we can zip sequences with a given "zipper" function:
With these new extensions to our mini-language for sequences, let's tackle another real Perl Monks problem.
SoPW Example: "Comparing two arrays and counting pairwise comparisons"Examine the thread Comparing two arrays and counting pairwise comparisons.
The seeker has two arrays of strings and wants to operate on each combination of elements from the two. The operation to be performed is a pairwise "comparison" which amounts to counting the character pairs that are formed when each element is zipped with the other.
Since combinations and zips are part of our sequence mini-language, our implementation is straightforward:
Extracting values from sequencesSometimes sequences are merely a means to an end. Sometimes we don't care about the sequences themselves but only the values they produce. So it makes sense to extend our language with functions that extract values from sequences.
We'll create two value extractors. The first is a general-purpose extractor that will capture each array value in a sequence as an arrayref. The second is for when we expect the sequence's values to be scalars. In this case, we don't need to wrap each value within an array and can just return it straight.
As a more realistic example, let's write a function to transpose a matrix, that is, exchange its rows and columns:
Another way to extract values is to collapse a sequence into a single value by "folding" an accumulating function across the sequence's values. This is similar to what List::Util's reduce function does for arrays. Here's our implementation for sequences:
Continuing with our matrix theme, let's write a function that computes the dot product of two vectors. We multiply the vectors' elements pairwise, and then add the resulting products:
So ends Part 1In this first meditation on sequences, we began with a simple, foundational definition and built upon it a small library of functions. These tiny functions form the core of a mini-language that moves us closer to our collections, where we can work more naturally. Instead of thinking in terms of loops and iteration variables, we can think in terms of products and zips and folds. Our language provides direct support for combining, transforming, iterating over, and extracting values from collections, which we represent as sequences.
So far, the ways in which we have combined sequences have been static. In the next part, we'll look at how we can place tiny functions in between the parts of our sequences to introduce dynamism. This simple extension brings many complex manipulations within reach. We'll look at some interesting applications and solve a few more Perl Monks problems. We might even referee a game of hyperdimensional tic-tac-toe.
Thanks for taking the time to read this meditation. If you have any criticisms or can think of any way to make my writing better, please let me know.