|Problems? Is your data what you think it is?
I recently needed to find all the ways to put 16 items into 4 groups of 4, where the ordering within the groups didn't matter, and the ordering of the groups themselves didn't matter. In other words, a type of set partition. I could have used an existing set partition iterator and just filtered out the partitions whose blocks had the right sizes. But for my purposes, this was extremely wasteful -- it would have been necessary to iterate over 10 billion partitions just to get the 2.6 million ones I wanted.
What I'm posting now is the code I wrote for the generalized problem of finding the partitions of a set into blocks of specified sizes (for example, 7 items into 3 blocks whose sizes are 4, 2, and 1). My first hope is that this is a useful piece of code for someone out there. But if that's not the case, I hope it can act as a lesson about iterators.
Here's how it works: The basic way to iterate over partitions in general is to maintain a restricted-growth (RG) string (see this page or this one). RG strings reflect the idea that when you are dispersing an element to a block, you can choose any previous block or start a new one.
So it's settled that we must maintain some sort of RG string. In fact, it's not hard to see that we need to all the permutations of a certain RG string that are also RG strings themselves (because permuting an RG string preserves the number of items sent to each block of the corresponding partition). But there's a catch -- RG strings are such that the resulting partitions are sorted by their smallest element (in particular, the smallest element of the whole set is always in partition #1). So we'll have to cycle through all permutations of the block sizes as well, since the first block should be any of the possible sizes.
So here's roughly how we can get all partitions into blocks of the given sizes:
I wrote next_rg_perm() in a similar vein. It's another memoryless iterator, and gives the (lexicographically) next permutation that also satisfies the RG property. Finally, I wrapped it up into an interface, and it looks like this:
You can use it like this:
If you want to follow along with the internal state of the iterator, uncomment the print statement inside the partition sub.
In my particular case, I need to partition things into blocks of equal sizes (or as close to equal as possible). This is called an equipartition, and may be the most common application of this kind of iterator. I wrote the following wrapper around partition for just this purpose:
This gives the same iterator as partition([4,4,4,4], 1..16).