Re: Computer science Problem  single dimension bin packing
by ikegami (Pope) on Aug 14, 2014 at 16:14 UTC

You actually used the name of the problem in the subject: Bin packing problem. Yes, it is NPhard.  [reply] 
Re: Computer science Problem  single dimension bin packing
by AppleFritter (Vicar) on Aug 14, 2014 at 16:11 UTC

This sounds like the knapsack problem problem to me, which is indeed NPhard. If you search CPAN for "knapsack", there's a few modules that may be of help.
That said, for many of these problems, it's only finding an optimal solution that's NPhard (or worse). Finding a good solution may be much easier (speaking in terms of computational complexity), depending of course on the precise constraints on what constitutes a "good" solution.
 [reply] 

The knapsack problem is different. You have a single bin, you are trying to maximize the total value of the items you place in the bin, each item having a size and value.
 [reply] 

Thanks, I forgot to mention (sorry) I'd already looked at the knapsack problem... but I didn't see any mention of "multiple knapsacks". Frankly, though, that page is rather impenetrable to me because of phrases like "Dominance relations"...
 [reply] 

The Wikipedia article has a section on that:
Multiple knapsack problem
This variation is similar to the Bin Packing Problem. It differs from the Bin Packing Problem in that a subset of items can be selected, whereas, in the Bin Packing Problem, all items have to be packed to certain bins. The concept is that there are multiple knapsacks. This may seem like a trivial change, but it is not equivalent to adding to the capacity of the initial knapsack. This variation is used in many loading and scheduling problems in Operations Research and has a PTAS.
I'm afraid I'm not familiar with this variation, but having a name to search for should help, I reckon.
 [reply] 



D'oh
Although I still don't know if that's actually an answer!.
 [reply] 
Re: Computer science Problem  single dimension bin packing
by kennethk (Abbot) on Aug 14, 2014 at 16:25 UTC

What are you trying to optimize? Are you trying to get an even distribution of data across a fixed number of drives? Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem. For case 1), a simple and quite effective solution is to go from largest object to smallest, putting it in the least filled bin.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %bins = (
A => 0,
B => 0,
C => 0,
D => 0,
);
my %dir;
my $int;
while(<DATA>) {
chomp;
my ($size, $name) = split /\s+/;
$name .= ++$int if $dir{$name}; # Disabiguation; all your names ar
+e identical
$dir{$name} = $size;
}
for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) {
my ($smallest) = sort {$bins{$a} <=> $bins{$b}} keys %bins;
$bins{$smallest} += $dir{$name};
}
print Dumper(\%bins);
output:
$VAR1 = {
'A' => '9885811019',
'D' => '9885704251',
'C' => '9885960464',
'B' => '9885704533'
};
For case 2), you would put it in the smallest available space in which it fits.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $max = 10**10;
my $new_bin = 'A';
my %bins;
my %dir;
my $int;
while(<DATA>) {
chomp;
my ($size, $name) = split /\s+/;
$name .= ++$int if $dir{$name}; # Disabiguation; all your names ar
+e identical
$dir{$name} = $size;
}
DIR_LOOP: for my $name (sort {$dir{$b} <=> $dir{$a}} keys %dir) {
for my $bin_name (sort {$bins{$b} <=> $bins{$a}} keys %bins) {
if ($bins{$bin_name} + $dir{$name} < $max) {
$bins{$bin_name} += $dir{$name};
next DIR_LOOP;
}
}
$bins{$new_bin++} = $dir{$name};
}
print Dumper(\%bins);
outputs
$VAR1 = {
'A' => '9999831421',
'D' => '9543689031',
'C' => '9999660101',
'B' => '9999999714'
};
Update: Added code
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
 [reply] [d/l] [select] 

What are you trying to optimize?
Neither. It's one of
 If he has an unlimited number of tapes: Minimize the number of tapes needed (bin packing problem).
 If he has an limited number of tapes: Maximize the value of what will fit on the tapes he has (multiple knapsack problem).
Efficiently fill drives of fixed size? The latter is (as AppleFritter points out) the knapsack problem.
No, it's not, as I already pointed out. In the knapsack problem, you have one bin (tape), and you're trying the maximize what you can put on it. Stuff is left behind.
 [reply] 

Stuff is left behind.
Then add another bin (tape) and start filling it with leftovers. Repeat until no more bins (tapes) or leftovers.
The first few bins would probably be most optimally filled for most datasets but if the goal is to use the least amount of tapes this should work... or am I missing something?
 FloydATC
Time flies when you don't know what you're doing
 [reply] 



 [reply] 

Hi kennethk. First of all, sorry for not stating my intention concisely. I thought I had, but on rereading I realise it's not clear.
However, as you correctly inferred, I'd like to (approximately) evenly distribute the data across a fixed number of drives (4, in this case). It looks like your first chunk of code does this, and produces far better results than my first attempt at placing the largest items first.
Embarrassingly, I've just looked at my code (which I didn't post) and realised that I in fact sorted in the wrong order, and it placed the smallest item first. The second lesson I should learn is that I should at least include the code that was faulty, as someone would've spotted it instantly. In any case I think I prefer your code slightly to my (now working) code, so I'll use that.
Thanks everyone (and sorry), as always some interesting responses, although I'm so far behind on the CS terms!
 [reply] 

 [reply] 
Re: Computer science Problem  single dimension bin packing ("First Fit Decreasing")
by LanX (Bishop) on Aug 14, 2014 at 17:33 UTC

Hi Davis
You didn't tell us what exactly needs to be optimized.
NP hardness depends largely on this question.
FWIW have a look at "First Fit Decreasing" algorithm, it has a complexity of
n * log n and guarantees a solution not worse than 3/2 of the optimum of the "Bin Packing Problem" (even 11/9 asymptotically)
This should be good enough in most cases.
BTW it's not "Knapsack" since you are not trying to maximize any second weight function of the packed items. They have a size but no additional cost value.
HTH :)
Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)}
 [reply] 
Re: Computer science Problem  single dimension bin packing
by Bethany (Scribe) on Aug 15, 2014 at 01:40 UTC

Might help, might not: The last time I had to do a closely related chore I ended up using a technique called "simulated annealing" to find goodbutnotabsolutelyoptimal solutions. The problem is it was decades ago using an allbutextinct programming language,* so I can't provide more than the name. Googling "simulated annealing" turns up articles that look relevant but wouldn't provide anything like a readytouse solution.
* Since someone's bound to ask, it was APL.
 [reply] 

 [reply] 

 [reply] 
Re: Computer science Problem  single dimension bin packing
by duelafn (Vicar) on Aug 15, 2014 at 13:24 UTC

Yes, this is bin packing. There is a module that will do it on CPAN: Algorithm::BinPack. I've also written some code which does this back when I was teaching a class on the topic and needed to be able to compare the various strategies. Generally, if you want the best packing, you just have to try the various strategies and choose the one which makes the most efficient use of space. For any of the strategies, one can construct a scenario where another of the strategies performs better. But, as LanX said, First Fit Decreasing is a usually strong performer.
 [reply] 
Re: Computer science Problem  single dimension bin packing
by sundialsvc4 (Abbot) on Aug 14, 2014 at 17:33 UTC

Quite honestly, I would duckanddodge the problem since there is probably no measurable advantage to an “optimal solution.” I’d say, just choose at random any bin that is bigenough to hold the next object. Any gardenvariety random number generator is designed to provide a fairlyeven distribution of generated values, so the odds are quite good that the actual distribution obtained for any particular backup will be favorable.
You could also, of course, add a second “backfill” pass to the algorithm, which examined the initial random result to see if it is obviouslyscrewy, as in a particular bucket being (say ...) “more than somen standard deviations away from the mean,” in which case you might, for example, make a random number of attempts to successfully “steal” a randomlyselected item (that will fit ...) from a randomlychosen bucket in order to drop it into the bucket(s) that are comingup “short.”
A little bit of simulation should help you determine whether your chosen compromisealgorithm is “good enough for peace work.”
 [reply] 