perlmeditation
blokhead
I recently needed to find all the ways to put 16 items into 4 groups of 4, where the ordering within the groups didn't matter, and the ordering of the groups themselves didn't matter. In other words, a type of <i>set partition</i>. I could have used an existing [id://386571|set partition iterator] and just filtered out the partitions whose blocks had the right sizes. But for my purposes, this was extremely wasteful -- it would have been necessary to iterate over 10 billion partitions just to get the 2.6 million ones I wanted.
<p>
What I'm posting now is the code I wrote for the generalized problem of finding the partitions of a set into blocks of specified sizes (for example, 7 items into 3 blocks whose sizes are 4, 2, and 1). My first hope is that this is a useful piece of code for someone out there. But if that's not the case, I hope it can act as a lesson about iterators.
<readmore>
<p>
Here's how it works: The basic way to iterate over partitions in general is to maintain a restricted-growth (RG) string (see [http://www.theory.csc.uvic.ca/~cos/inf/setp/SetPartitions.html|this page] or [http://mathworld.wolfram.com/RestrictedGrowthString.html|this one]). RG strings reflect the idea that when you are dispersing an element to a block, you can choose any previous block or start a new one.
<p>
So it's settled that we must maintain some sort of RG string. In fact, it's not hard to see that we need to all the permutations of a certain RG string that are also RG strings themselves (because permuting an RG string preserves the number of items sent to each block of the corresponding partition). But there's a catch -- RG strings are such that the resulting partitions are sorted by their smallest element (in particular, the smallest element of the whole set is always in partition #1). So we'll have to cycle through all <i>permutations of the block sizes</i> as well, since the first block should be any of the possible sizes.
<p>
So here's roughly how we can get all partitions into blocks of the given sizes:
<ul>
<li> for each permutation of @block_size
<ul>
<li> initialize @rg to the (lexicographically) first RG string, where the multiplicities of each element match @block_size. For instance, (4,2,5) becomes the RG string 00001122222.
<li> for each permutation of @rg that is also an RG string, return the corresponding partition
</ul>
</ul>
Of course, we do all this iteratively instead. To get the "next permutation" of @block_size, I knew to use [id://29374|tye's memoryless iterator] (I only modified a few superficial things in his code). Since it correctly handles duplicates, we don't have to worry about duplicate block sizes (for example, splitting 11 items into blocks of 4+4+3).
<p>
I wrote next_rg_perm() in a similar vein. It's another memoryless iterator, and gives the (lexicographically) next permutation that also satisfies the RG property. Finally, I wrapped it up into an interface, and it looks like this:
<code>
use List::Util 'sum';
sub partition {
my @block_size = sort { $a <=> $b } @{ shift(@_) };
my @items = @_;
@items == sum @block_size
or die "Combined size of blocks must equal the number of items";
my @rg;
return sub {
if ( !@rg ) {
@rg = map { ($_) x $block_size[$_] } 0 .. $#block_size;
} elsif ( ! next_rg_perm(\@rg) ) {
next_permute(\@block_size) or return;
@rg = map { ($_) x $block_size[$_] } 0 .. $#block_size;
}
## uncomment this to see the internal state:
## print "@block_size / @rg\n";
my @return;
push @{ $return[ $rg[$_] ] }, $items[$_] for 0 .. $#items;
return @return;
};
}
## to obtain lexicographically next RG string, look for the rightmost
## position where we have an appropriate candidate available. the
## candidate is smallest number to the right of our current position
## such that:
## - candidate is larger than our current position
## - candidate is not >=2 larger than everything to the left
## (restricted growth property)
sub next_rg_perm {
my $vals = shift;
my ($candidate, @avail);
my $i = @$vals;
while (--$i) {
($candidate) = grep defined, @avail[ $vals->[$i]+1 .. $#avail ];
last if defined $candidate
and grep { $_ >= $vals->[$candidate]-1 } @$vals[0..$i-1];
$avail[ $vals->[$i] ] = $i;
}
return if $i == 0;
@$vals[$i, $candidate] = @$vals[$candidate, $i];
@$vals[$i+1 .. $#$vals] = sort { $a <=> $b } @$vals[$i+1 .. $#$vals];
return 1;
}
## stolen ... er, adapted from tye: http://perlmonks.org/?node_id=29374
sub next_permute {
my $vals = shift;
return if @$vals < 2;
## find rightmost position where the sequence increases
my $i = $#$vals - 1;
$i-- until $i < 0 or $vals->[$i] < $vals->[$i+1];
return if $i < 0;
## reverse everything to the right (now it's in increasing order)
@$vals[ $i+1 .. $#$vals ] = reverse @$vals[ $i+1 .. $#$vals ];
## move right to find the first number that's larger, which we
## will swap with position i
my $j = $i+1;
$j++ until $vals->[$i] < $vals->[$j];
@$vals[$i,$j] = @$vals[$j,$i];
return 1;
}
</code>
You can use it like this:
<code>
## split 'a' through 'f' into blocks of sizes 3+2+1:
my $iter = partition( [3,2,1], qw[a b c d e f]);
while (my @parts = $iter->()) {
print "[@$_] " for @parts;
print $/;
}
</code>
If you want to follow along with the internal state of the iterator, uncomment the print statement inside the <tt>partition</tt> sub.
<p>
In my particular case, I need to partition things into blocks of equal sizes (or as close to equal as possible). This is called an equipartition, and may be the most common application of this kind of iterator. I wrote the following wrapper around <tt>partition</tt> for just this purpose:
<code>
sub equipartition {
my $parts = shift;
my $items = @_;
my @p = map { int( ($items+$_)/$parts ) } 0 .. $parts-1;
partition( \@p, @_ );
}
my $iter = equipartition(4 => 1..16);
...
</code>
This gives the same iterator as <c>partition([4,4,4,4], 1..16)</c>.
</readmore>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-137386">
<p>
blokhead
</div></div>