Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
I recently needed to find all the ways to put 16 items into 4 groups of 4, where the ordering within the groups didn't matter, and the ordering of the groups themselves didn't matter. In other words, a type of set partition. I could have used an existing set partition iterator and just filtered out the partitions whose blocks had the right sizes. But for my purposes, this was extremely wasteful -- it would have been necessary to iterate over 10 billion partitions just to get the 2.6 million ones I wanted.

What I'm posting now is the code I wrote for the generalized problem of finding the partitions of a set into blocks of specified sizes (for example, 7 items into 3 blocks whose sizes are 4, 2, and 1). My first hope is that this is a useful piece of code for someone out there. But if that's not the case, I hope it can act as a lesson about iterators.

Here's how it works: The basic way to iterate over partitions in general is to maintain a restricted-growth (RG) string (see this page or this one). RG strings reflect the idea that when you are dispersing an element to a block, you can choose any previous block or start a new one.

So it's settled that we must maintain some sort of RG string. In fact, it's not hard to see that we need to all the permutations of a certain RG string that are also RG strings themselves (because permuting an RG string preserves the number of items sent to each block of the corresponding partition). But there's a catch -- RG strings are such that the resulting partitions are sorted by their smallest element (in particular, the smallest element of the whole set is always in partition #1). So we'll have to cycle through all permutations of the block sizes as well, since the first block should be any of the possible sizes.

So here's roughly how we can get all partitions into blocks of the given sizes:

  • for each permutation of @block_size
    • initialize @rg to the (lexicographically) first RG string, where the multiplicities of each element match @block_size. For instance, (4,2,5) becomes the RG string 00001122222.
    • for each permutation of @rg that is also an RG string, return the corresponding partition
Of course, we do all this iteratively instead. To get the "next permutation" of @block_size, I knew to use tye's memoryless iterator (I only modified a few superficial things in his code). Since it correctly handles duplicates, we don't have to worry about duplicate block sizes (for example, splitting 11 items into blocks of 4+4+3).

I wrote next_rg_perm() in a similar vein. It's another memoryless iterator, and gives the (lexicographically) next permutation that also satisfies the RG property. Finally, I wrapped it up into an interface, and it looks like this:

use List::Util 'sum'; sub partition { my @block_size = sort { $a <=> $b } @{ shift(@_) }; my @items = @_; @items == sum @block_size or die "Combined size of blocks must equal the number of items +"; my @rg; return sub { if ( !@rg ) { @rg = map { ($_) x $block_size[$_] } 0 .. $#block_size; } elsif ( ! next_rg_perm(\@rg) ) { next_permute(\@block_size) or return; @rg = map { ($_) x $block_size[$_] } 0 .. $#block_size; } ## uncomment this to see the internal state: ## print "@block_size / @rg\n"; my @return; push @{ $return[ $rg[$_] ] }, $items[$_] for 0 .. $#items; return @return; }; } ## to obtain lexicographically next RG string, look for the rightmost ## position where we have an appropriate candidate available. the ## candidate is smallest number to the right of our current position ## such that: ## - candidate is larger than our current position ## - candidate is not >=2 larger than everything to the left ## (restricted growth property) sub next_rg_perm { my $vals = shift; my ($candidate, @avail); my $i = @$vals; while (--$i) { ($candidate) = grep defined, @avail[ $vals->[$i]+1 .. $#avail +]; last if defined $candidate and grep { $_ >= $vals->[$candidate]-1 } @$vals[0..$i-1]; $avail[ $vals->[$i] ] = $i; } return if $i == 0; @$vals[$i, $candidate] = @$vals[$candidate, $i]; @$vals[$i+1 .. $#$vals] = sort { $a <=> $b } @$vals[$i+1 .. $#$val +s]; return 1; } ## stolen ... er, adapted from tye: +4 sub next_permute { my $vals = shift; return if @$vals < 2; ## find rightmost position where the sequence increases my $i = $#$vals - 1; $i-- until $i < 0 or $vals->[$i] < $vals->[$i+1]; return if $i < 0; ## reverse everything to the right (now it's in increasing order) @$vals[ $i+1 .. $#$vals ] = reverse @$vals[ $i+1 .. $#$vals ]; ## move right to find the first number that's larger, which we ## will swap with position i my $j = $i+1; $j++ until $vals->[$i] < $vals->[$j]; @$vals[$i,$j] = @$vals[$j,$i]; return 1; }
You can use it like this:
## split 'a' through 'f' into blocks of sizes 3+2+1: my $iter = partition( [3,2,1], qw[a b c d e f]); while (my @parts = $iter->()) { print "[@$_] " for @parts; print $/; }
If you want to follow along with the internal state of the iterator, uncomment the print statement inside the partition sub.

In my particular case, I need to partition things into blocks of equal sizes (or as close to equal as possible). This is called an equipartition, and may be the most common application of this kind of iterator. I wrote the following wrapper around partition for just this purpose:

sub equipartition { my $parts = shift; my $items = @_; my @p = map { int( ($items+$_)/$parts ) } 0 .. $parts-1; partition( \@p, @_ ); } my $iter = equipartition(4 => 1..16); ...
This gives the same iterator as partition([4,4,4,4], 1..16).


In reply to Partitioning a set into parts of given sizes by blokhead

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.