This is the implementation
I liked best (so far):
# find subgroups with *almost* optimal sum distribution.
# performance is roughly linear on N and Columns.
use strict;
my $benchmark;
sub splitter {
my ($columns, $plist) = @_;
return $plist unless --$columns;
my $sum1 = 0; my $sum2 = eval join "+" => @$plist;
my @tmp1 = (); my @tmp2 = @$plist;
my @best = (1e9, undef, undef);
while ($#tmp2 > 0) {
$benchmark++;
push @tmp1, shift @tmp2;
$sum1 += $tmp1[-1];
my $mean_ratio = $sum1 * $columns / ($sum2 - $sum1);
$mean_ratio = 1 / $mean_ratio if $mean_ratio < 1;
last unless $best[0] > $mean_ratio;
@best = ($mean_ratio, [@tmp1], [@tmp2]);
}
return $plist if $best[0] == 1e9; # underflow
return $best[1],
$columns > 0 ? splitter($columns, $best[2]) : $best[2];
}
my @list = (10, 15, 25, 30, 10, 13);
for my $columns (3, 4, 5) {
$benchmark = 0;
print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n";
print "Inner loop = $benchmark\n\n";
}
my @list = (1..150);
for my $columns (10, 50, 80, 100) {
$benchmark = 0;
print "[", map(" [@$_]", splitter($columns, \@list ) ), " ]\n\n";
print "Inner loop = $benchmark\n\n";
}
Although this does not always give an
exact solution, it has some advantages:
it is pretty fast
it is not memory-hungry
it works fine with random numbers
time is linearly proportional to number of columns and
to list size
I think it might be possible to find a better
solution in time O(N * columns * log(N * columns)),
by making the inner loop recursive.
update: it might be easy to rewrite
"splitter" to don't use recursion, since it
calls itself from outside the while loop