I've been chewing on this problem for a day now. This is clearly problem with two logical parts. Part 1 is to generate all legal mappings of N columns of data into M buckets, given the constraints that no bucket can be empty, and that the columns need to stay ordered. The second part is apply these mappings to the input data, and select a mapping that yields a "best fit."
I focused on the first part, looking for a quicker, simpler solution. I think I have one. Here it is. Given a number of columns and a number of buckets, the code below calculates all legal mappings of columns to buckets, and returns these in a hash, where the key is a printable string, and the value is an anonymous array.
{
# map columns to buckets. key is string, value is anonymous array.
my %c2bMap;
sub c2bMappings {
my($buckets, $columns) = @_;
die "bogus args" unless $buckets > 1 && $columns > $buckets;
%c2bMap = ();
_genFrom(0, (0) x ($columns - $buckets), 1 .. ($buckets - 1));
return \%c2bMap;
}
sub _genFrom {
my @c2bMap = @_;
return if exists $c2bMap{"@c2bMap"};
print "@c2bMap\n"; #DEBUG
$c2bMap{"@c2bMap"} = \@c2bMap;
foreach my $i ( 2 .. $#c2bMap ) {
my $n = $c2bMap[$i] - 1;
if ( $c2bMap[$i - 2] == $n && $c2bMap[$i - 1] == $n ) {
local $c2bMap[$i-1] = $c2bMap[$i];
_genFrom(@c2bMap);
}
}
}
}
c2bMappings(4,6);
__END__
0 0 0 1 2 3
0 0 1 1 2 3
0 1 1 1 2 3
0 1 1 2 2 3
0 1 2 2 2 3
0 1 2 2 3 3
0 1 2 3 3 3
0 1 1 2 3 3
0 0 1 2 2 3
0 0 1 2 3 3
Recursion only happens with valid mappings.
Note the selective localization of an element of the array that the code is about to recurse on.