|Problems? Is your data what you think it is?|
Two handy tools for nested arrays: mapat and transposeby tmoertel (Chaplain)
|on Dec 07, 2004 at 01:25 UTC||Need Help??|
Here is a fun and flexible way to solve many problems that involve nested array data, such as the problem posed by knirirr in "Mean and standard deviation amongst multiple matrices." In this meditation, we will create two small helper functions that let us transform and restructure array data. Together, they make it easy to solve problems like the one posed in the referenced post.
The problem, in short, is that we have an array of matrices and we want to apply a function "matrixwise" across the array. For example, let us consider the two-matrix case:
Our goal is to compute the following matrix from $mats:
for some given function f. (In the case of the knirirr's problem, as I understand it, f will compute the mean and standard deviation of its arguments.)
We can think about the array of matrices $mats as itself a 3-dimensional matrix. Its dimensions, most significant first, are (m,r,c), where m gives the position of the sub-matrix we want, r gives the row, and c gives the column:
For example, the value at coordinate (0,0,0) is 1; and the value at (1,0,0) is 5.
A common solution
With this representation in mind, we can arrive at the following, typical approach to a solution:
To see that it works, let us create a few functions to help us examine the results:
Now we can see that the nested-for-loop solution does indeed compute the desired result:
One problem with the nested-for solution, however, is that it is hardcoded for the specifics of this particular, 3-dimensional case. Another problem is that the code has a high density of array operations, providing many opportunities to mix up index variables or make similar coding mistakes.
In day-to-day coding, we frequently process arrays, and arrays of arrays, and AoAoAs, and similar structures. It might therefore be worthwhile to invest some time in helper functions that let us manipulate nested data more directly.
One such helper is the ever-handy mapat. It is similar to map except that it allows us to map at a particular depth of nesting in arbitrarily-nested arrays:
To see how mapat works, let us target the various depths of nesting in our original 3-dimensional input matrix $mats:
Note that mapat preserves the structure of its input down to the targeted depth. This is an important property that lets us process nested values "in structure," without having to tear down and build up arrays manually, like we did with the nested-for-loop solution.
While it is probably clear that mapat is a general and useful function for working with nested arrays, the function doesn't actually solve our original problem. Examining the output, we can see that mapat(2, ...) provides the correct output structure, but not the correct values.
The problem is that the elements passed to f are not grouped matrixwise. We are passing in rows extracted from our 3-dimensional array, and we ought to be passing in matrixwise, row-column stacks. That is, for each sub-matrix M and row R, we are passing to f all of the values whose coordinates are given by (m=M,r=R,c=*), where * matches any value. We ought to be passing in (m=*,r=R,c=C) for each valid row-column pair (R,C).
To provide the correct values to mapat, we can restructure $mats so that m becomes its least-significant dimension. In other words, we can convert $mats's dimensions from (m,r,c) to (r,c,m). Then mapat(2, ...) can be used to apply f to exactly the desired elements: (r=R,c=C,m=*).
We can use matrix transposition to perform the restructuring. Transposition swaps the order of the two most-significant dimensions of a matrix. It is commonly used in spreadsheets to convert rows into columns and vice versa. Here is how it works for the 2-dimensional case:
I usually implement transpose in terms of zip, which is a handy helper for functional programming. There are many other implementations lying around if you don't like this one:
With transpose in hand, we can restructure our original input matrix in two steps. First, we transpose at depth 0 to reorder the first two dimensions:
Then, we use mapat to transpose the result of the previous step at depth 1, in effect transposing the result's sub-matrices. This swaps the second and third dimensions:
Note that the output values are grouped matrixwise. This is exactly the grouping that we want.
The final step is to use mapat(2, ...) to apply f to the second "level" of our restructured matrix:
And that is our desired output.
Putting all of the above together results in our final function, matrixwise_map:
It does exactly what we want, in one fell swoop:
Back to our original problem
To return to the knirirr's original problem, here is how we might compute the matrixwise mean of the input matrices:
The matrixwise standard deviations can be computed similarly: Create a stdev function and matrixwise_map it over $mats. That's it!
What have we gained?
In solving a particular nested-array problem, we created two small functions mapat and transpose that we can use to solve many kinds of related problems. The mapat function lets us transform nested array data without disturbing its structure, and transpose lets us restructure dimensional data by rearranging dimensions. As an example of what these functions can do, we created matrixwise_map. It lets us apply functions matrixwise over an array of matrices.
Together or alone, all three of these functions are handy. If you find yourself writing a lot of code to process array data, consider whether you could benefit from functions like these or from problem-solving approaches similar to those we explored in this meditation.
Thanks for taking the time to read this meditation. If you can see any way to improve my writing or explanations, please let me know.