gr0k has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on an accounting problem where we have a list of checks and an amount. We need to find all possible combinations of checks that add together to make that amount. Basically if we had a record set like:
A - 10
B - 5
C - 13
D - 3
E - 15
F - 1
And we're searching for amount = 16. That would return 3 possible matches:
A + B + F
C + D
E + F
I came up with a way to do it using the Algorithm::Loops module and calculate all possible permutations down to a certain depth and ignore duplicates (such as ABF, AFB, BFA, etc are all the same thing). The problem is, we need to be able to search through potentially billions of combinations which takes a looooong time. I'm hoping to get some help optimizing my code to make it run faster. Right now with some tests I've run, this code can go through about 5 million combinations in about 3 minutes on a 2.4 ghz.
#!/usr/bin/perl use strict; use Algorithm::Loops qw(NestedLoops); my ($rs_ref,$key,$value,$sum,$count,%matches); my $amount = 16; my $depth = 6; $rs_ref->{'A'}->{'amount'} = 10; $rs_ref->{'B'}->{'amount'} = 5; $rs_ref->{'C'}->{'amount'} = 13; $rs_ref->{'D'}->{'amount'} = 3; $rs_ref->{'E'}->{'amount'} = 15; $rs_ref->{'F'}->{'amount'} = 1; my $start_time = time(); $count = NestedLoops( [ ( [keys %{$rs_ref}] ) x $depth ], { OnlyWhen => sub { @_ == keys %{{@_,reverse@_}}; } }, sub { $sum = 0; foreach $key(@_) { $value = $rs_ref->{$key}->{'amount'}; $sum += $value; } if ($sum == $amount) { $matches{join ' ', sort { $a cmp $b} @_} = 1; } }, ); print "Searched $count combinations...\n"; foreach $key (keys %matches) { print "MATCH: $key\n"; } my $end_time = time(); print "Finished in " . ($end_time - $start_time)/60 . " minutes.\n";
Which prints out:
Searched 1956 combinations...
Finished in 0.0166666666666667 minutes.

Replies are listed 'Best First'.
Re: amount permutations
by kvale (Monsignor) on Mar 04, 2004 at 18:34 UTC
    Here is a simpler program that accomplishes the same thing for up to 32 checks:
    my @check = (10, 5, 13, 3, 15, 1); my $desired = 16; foreach my $index (0..2**@check-1) { my $sum = 0; foreach my $pos (0..@check-1) { my $bit = ($index >> $pos) % 2; $sum += $bit * $check[$pos]; } if ($sum == $desired) { my @combo; foreach my $pos (0..@check-1) { push @combo, $check[$pos] if ($index >> $pos) % 2; } print join " ", @combo, "\n"; } }
    Here I index the possibilities with a simple integer expressing a binary combination of checks.

    Timing was 55 milliseconds on an Athlon 2100.


      It will take me a while to try and figure out what you're code is doing, but do you mean this method will only work with up to 32 checks? My recordset could have hundreds of records in it. My example just had 6.
        The limitation is just the typical number of bits in an integer on 32-bit machines.

        Let me reiterate Abigails sound advice, however. Say you have 100 checks. Then any exhaustive search will take 2**100 tries. That is around 10**30, so for 1 msec per try, the program would take 10**27 seconds. The age of the universe is only around 5 * 10**17 sec, so exhaustive search is hopeless.

        Even creating a more clever branch-and-bound algorithm will only potentially reduce the factor of 2 a little.

        So I think you need to rethink your task. Do you really need exact amounts, or can you approximate? If all the checks were even, but the desired amout was odd, there would be no solution; would your business collapse at that point? Look at what you can do to relax the constraints. There are fine polynomial heuristics for the knapsack problem that get you close in a modest time.


Re: amount permutations
by Abigail-II (Bishop) on Mar 04, 2004 at 17:25 UTC
    That's a very hard problem. Even the question is there a subset that totals to the given amount is NP-complete, and is known as the knapsack problem.


Re: amount permutations
by Limbic~Region (Chancellor) on Mar 04, 2004 at 18:59 UTC
    I was all impressed that I got over a 300% increase from your original code which should get better with larger combinations sets until I saw kvale's code. Here it is anyway: Cheers - L~R
      This does appear to be a bit faster and looks like it will work for what we need it to do. Have to do a bit more testing with our data. One question though, I notice it produces a lot more combinations, any idea how I could calculate the number of combinations from the number of records searched? That way I could output some status information to let them know how far they are into the search. Thanks!
        Well, it depends on what answer you want. To be honest you were checking the same number of combinations as me. The difference is you weren't counting ones that you discarded. There was effort spent on determing they could be discarded though - which is why I included all combinations.

        In order to make my $count look like your count will require more work - and make the process slower. If you want - you can change the position of $count++ to after the for loop. There you would only be counting the number of combinations that did not have a duplicate number in them and whose combined value was not greater than the target sum.

        I would recommend in the real problem removing all candidates that were already larger than the target number.

        Cheers - L~R
      Upon closer inspection your code has a bug where it would eliminate two checks with the same amount since you're checking an array of amounts. :( I'll have to modify it to allow that and run some tests to see if it will still result in a speed increase...
        You will have to explain what you mean or else explain what you think the following is doing:
        @_ == keys %{@_,reverse @_}
        Besides - this is a terribly inefficient way to go about what you are trying to do as explained by everyone. I was just showing ways to improve the efficiency of what you already had.


Re: amount permutations
by tachyon (Chancellor) on Mar 04, 2004 at 18:49 UTC

    If you can work with just pairings then a lookup table is very fast.

    my @list =( 10, 5, 13, 3, 15, 1 ); my $total = 16; my %lookup; @lookup{@list} = (); for my $val( @list ) { my $need = $total -$val; next unless exists $lookup{$need}; print "Found $val + $need = $total\n"; delete $lookup{$val}; }



      Afraid I'd need to be able to match any number of checks not just 2.