Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Partitioning a set into parts of given sizes

by blokhead (Monsignor)
on Feb 28, 2006 at 22:19 UTC ( #533530=perlmeditation: print w/ replies, xml ) Need Help??

I recently needed to find all the ways to put 16 items into 4 groups of 4, where the ordering within the groups didn't matter, and the ordering of the groups themselves didn't matter. In other words, a type of set partition. I could have used an existing set partition iterator and just filtered out the partitions whose blocks had the right sizes. But for my purposes, this was extremely wasteful -- it would have been necessary to iterate over 10 billion partitions just to get the 2.6 million ones I wanted.

What I'm posting now is the code I wrote for the generalized problem of finding the partitions of a set into blocks of specified sizes (for example, 7 items into 3 blocks whose sizes are 4, 2, and 1). My first hope is that this is a useful piece of code for someone out there. But if that's not the case, I hope it can act as a lesson about iterators.

Here's how it works: The basic way to iterate over partitions in general is to maintain a restricted-growth (RG) string (see this page or this one). RG strings reflect the idea that when you are dispersing an element to a block, you can choose any previous block or start a new one.

So it's settled that we must maintain some sort of RG string. In fact, it's not hard to see that we need to all the permutations of a certain RG string that are also RG strings themselves (because permuting an RG string preserves the number of items sent to each block of the corresponding partition). But there's a catch -- RG strings are such that the resulting partitions are sorted by their smallest element (in particular, the smallest element of the whole set is always in partition #1). So we'll have to cycle through all permutations of the block sizes as well, since the first block should be any of the possible sizes.

So here's roughly how we can get all partitions into blocks of the given sizes:

  • for each permutation of @block_size
    • initialize @rg to the (lexicographically) first RG string, where the multiplicities of each element match @block_size. For instance, (4,2,5) becomes the RG string 00001122222.
    • for each permutation of @rg that is also an RG string, return the corresponding partition
Of course, we do all this iteratively instead. To get the "next permutation" of @block_size, I knew to use tye's memoryless iterator (I only modified a few superficial things in his code). Since it correctly handles duplicates, we don't have to worry about duplicate block sizes (for example, splitting 11 items into blocks of 4+4+3).

I wrote next_rg_perm() in a similar vein. It's another memoryless iterator, and gives the (lexicographically) next permutation that also satisfies the RG property. Finally, I wrapped it up into an interface, and it looks like this:

use List::Util 'sum'; sub partition { my @block_size = sort { $a <=> $b } @{ shift(@_) }; my @items = @_; @items == sum @block_size or die "Combined size of blocks must equal the number of items +"; my @rg; return sub { if ( !@rg ) { @rg = map { ($_) x $block_size[$_] } 0 .. $#block_size; } elsif ( ! next_rg_perm(\@rg) ) { next_permute(\@block_size) or return; @rg = map { ($_) x $block_size[$_] } 0 .. $#block_size; } ## uncomment this to see the internal state: ## print "@block_size / @rg\n"; my @return; push @{ $return[ $rg[$_] ] }, $items[$_] for 0 .. $#items; return @return; }; } ## to obtain lexicographically next RG string, look for the rightmost ## position where we have an appropriate candidate available. the ## candidate is smallest number to the right of our current position ## such that: ## - candidate is larger than our current position ## - candidate is not >=2 larger than everything to the left ## (restricted growth property) sub next_rg_perm { my $vals = shift; my ($candidate, @avail); my $i = @$vals; while (--$i) { ($candidate) = grep defined, @avail[ $vals->[$i]+1 .. $#avail +]; last if defined $candidate and grep { $_ >= $vals->[$candidate]-1 } @$vals[0..$i-1]; $avail[ $vals->[$i] ] = $i; } return if $i == 0; @$vals[$i, $candidate] = @$vals[$candidate, $i]; @$vals[$i+1 .. $#$vals] = sort { $a <=> $b } @$vals[$i+1 .. $#$val +s]; return 1; } ## stolen ... er, adapted from tye: http://perlmonks.org/?node_id=2937 +4 sub next_permute { my $vals = shift; return if @$vals < 2; ## find rightmost position where the sequence increases my $i = $#$vals - 1; $i-- until $i < 0 or $vals->[$i] < $vals->[$i+1]; return if $i < 0; ## reverse everything to the right (now it's in increasing order) @$vals[ $i+1 .. $#$vals ] = reverse @$vals[ $i+1 .. $#$vals ]; ## move right to find the first number that's larger, which we ## will swap with position i my $j = $i+1; $j++ until $vals->[$i] < $vals->[$j]; @$vals[$i,$j] = @$vals[$j,$i]; return 1; }
You can use it like this:
## split 'a' through 'f' into blocks of sizes 3+2+1: my $iter = partition( [3,2,1], qw[a b c d e f]); while (my @parts = $iter->()) { print "[@$_] " for @parts; print $/; }
If you want to follow along with the internal state of the iterator, uncomment the print statement inside the partition sub.

In my particular case, I need to partition things into blocks of equal sizes (or as close to equal as possible). This is called an equipartition, and may be the most common application of this kind of iterator. I wrote the following wrapper around partition for just this purpose:

sub equipartition { my $parts = shift; my $items = @_; my @p = map { int( ($items+$_)/$parts ) } 0 .. $parts-1; partition( \@p, @_ ); } my $iter = equipartition(4 => 1..16); ...
This gives the same iterator as partition([4,4,4,4], 1..16).

blokhead

Comment on Partitioning a set into parts of given sizes
Select or Download Code
Re: Partitioning a set into parts of given sizes
by spurperl (Priest) on Mar 01, 2006 at 05:58 UTC
    Nice ++

    BTW, some exhaustive information about partitioning can be found in Knuth's "pre-fascicle" of Volume 4x of TAOCP, which is freely downloadable on the net (even on his website, I think).

Re: Partitioning a set into parts of given sizes
by Limbic~Region (Chancellor) on Nov 21, 2006 at 14:49 UTC
    blokhead,
    I recently used your code because Set::Partition seems to be misnamed. According to Wikipedia, there is a difference between Set Partitioning and Ordered Set Partitioning. The latter distinguishing between the order the sets appear. It should probably be named Set::Partition::Ordered.

    In any case, I have a few comments. First, why do you require that the block sizes add up to the total number of items? I had to work around this restriction by generating the combinations of a fixed size and then using your code to partition the combinations. It would be nice if that restriction were @sum @block_size <= @items.

    The second comment is about availability on CPAN. I think this is something that might be of great value to others. I had intended to simplify my code but would love to make this a collaborative effort. Your thoughts?

    Cheers - L~R

      First, why do you require that the block sizes add up to the total number of items?
      Otherwise what you get is not a partition according to most definitions. You can somewhat get around this by partitioning into groups and letting one group be a "trash" group that is not output. But for that you must have one distinguished group (as in ordered partition) and the rest undifferentiated (as in unordered partition), so it would be trickier.

      Now that I'm thinking about it a little more, though, it may be possible to adadpt how it generates RG-sequences internally to support just that by allowing the value "0" to signify "not included in output," where every item (including the first) can take on value 0. I wouldn't be able to code it up for a while, but it seems do-able.

      blokhead

        blokhead,
        I looked at the problem a bit differently. I have a group of N items that I need to group into sets of a smaller size and then partition each set. I think this is a useful enough feature to include it - I am just not sure they way I implemented it is optimal. What are your thoughts on collaborating on CPAN module(s)?

        Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://533530]
Approved by thor
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2014-09-23 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls