Mark Jason Dominus And Me - The Partition Problem

I've been reading Mark Jason Dominus' "Higher Order Perl", and I really like it. It's certainly helping me to open my mind to new ways of approaching problems.

...And speaking of problems, there's one that MJD presents in the book which he calls "The Partition Problem", where you have a random sequence of numbers (representing things of value, like currency amounts, scoops of ice cream, or "treasures") and you need to figure out how to evenly divide them (or in some cases to hit a target number, or "fair share"). Basically you are looking at the numbers to find out what numbers add up to the target "fair share". In some code examples for this problem, the "fair share" target is hard coded, and in others it is 1/2 the total sum of all the numbers. Both code examples in this meditation will work either way.

MJD approaches the partition problem with some smart recursion. The issue I face is that his recursion is smarter than me; it just makes no damn sense. Well, to be fair, I spent several hours with it attempting to understand it better, but now it just makes damn little sense. Sure, it's an improvement, but the problem here, at least for me, is that the approach to the problem is something I can't visualize. If I can't see it mapped out in my mind, I can't really grasp it completely; I want to see what the code is doing, and understand the flow of execution.

In an attempt to make sense of the logic, I took MJD's example code and added things to it that would output some visual queues to help me understand where/when/how the code was diving through recursions and what was being returned, and in what order, back up the stack. As part of this meditation, I'll include my example code and some example output.

In a further attempt to make sense MJD's approach to the Partition Problem I decided to take some things I learned from his code and teachings in the book, and then write my own solution from scratch in a way that makes sense to me. I will also include that code here as well.

The issue upon which I meditate is that I remain disappointed that after having read, pondered, and customized MJD's code, and thereafter having written some code of my own which solves the same partition problem, I still don't completely understand MJD's code and why it works. (This isn't a happy meditation).

Code samples follow below. You can read the exact code from MJD's "Higher Order Perl" book for free online at http://hop.perl.plover.com/book/. It starts on page 35.

As for the code I'm providing, please be forewarned that it isn't the most beautiful, but by definition of the word "most", most code isn't. (Meditate on that :smirk:)

MJD's Approach, Modified

Here is the MJD code, modified to provide visual output that explains what the code is doing

#!/usr/bin/perl

use strict;
use warnings;

my ( $target, @treasures, $calls );

$target = 5;
@treasures = ( 50, 2, 4, 3, 1, 2, 4, 8 );

sub find_share
{
   my ( $target, $treasures, $share, $depth ) = @_;

   $depth ||= 0; $depth += 1;
   $share ||= [];

   $calls++;

   printf qq{%-8s Calls: %-5d Depth: %-5d Share: %-10s Target: %-5d Tr
+easures: %-16s ... },
      ( '-' x $depth ) . '>',
      $calls,
      $depth,
      "[ @$share ]",
      $target,
      "[ @$treasures ]";

   print qq{Returning "[]" because target == 0\n\n}
      and return [] if $target == 0;

   print qq{Returning undef because target < 0\n\n}
      and return undef if $target < 0;

   print qq{Returning undef because no treasures remaining\n\n}
      and return undef if @$treasures == 0;

   my ( $first, @rest ) = @$treasures;

   print qq(Putting "$first" into the share, leaving treasures [ @rest
+ ]\n\n);

   my $solution = find_share( $target - $first, \@rest, [ @$share, $fi
+rst ], $depth );

   print qq(A recursive call at depth @{[ $depth + 1 ]} found that a s
+olution of [ @{[$first,@$solution]} ] adds up to my target of $target
+ at depth $depth\n\n)
      and return [ $first, @$solution ] if $solution;

   print '-' x $depth, '> ',
      qq{Situation failed.  Couldn't hit target.  Omitting treasure [ 
+$first ] and backtracking to target of $target, treasures [ @rest ] a
+t depth $depth\n\n};

   return find_share( $target, \@rest, $share, $depth );
}

print qq<Solution: @{ find_share( $target, \@treasures ) }>;

exit;
[download]

...And its output (it's quite a "wide-angle" view):

->       Calls: 1     Depth: 1     Share: [  ]       Target: 5     Tre
+asures: [ 50 2 4 3 1 2 4 8 ] ... Putting "50" into the share, leaving
+ treasures [ 2 4 3 1 2 4 8 ]

-->      Calls: 2     Depth: 2     Share: [ 50 ]     Target: -45   Tre
+asures: [ 2 4 3 1 2 4 8 ] ... Returning undef because target < 0

-> Situation failed.  Couldn't hit target.  Omitting treasure [ 50 ] a
+nd backtracking to target of 5, treasures [ 2 4 3 1 2 4 8 ] at depth 
+1

-->      Calls: 3     Depth: 2     Share: [  ]       Target: 5     Tre
+asures: [ 2 4 3 1 2 4 8 ] ... Putting "2" into the share, leaving tre
+asures [ 4 3 1 2 4 8 ]

--->     Calls: 4     Depth: 3     Share: [ 2 ]      Target: 3     Tre
+asures: [ 4 3 1 2 4 8 ]  ... Putting "4" into the share, leaving trea
+sures [ 3 1 2 4 8 ]

---->    Calls: 5     Depth: 4     Share: [ 2 4 ]    Target: -1    Tre
+asures: [ 3 1 2 4 8 ]    ... Returning undef because target < 0

---> Situation failed.  Couldn't hit target.  Omitting treasure [ 4 ] 
+and backtracking to target of 3, treasures [ 3 1 2 4 8 ] at depth 3

---->    Calls: 6     Depth: 4     Share: [ 2 ]      Target: 3     Tre
+asures: [ 3 1 2 4 8 ]    ... Putting "3" into the share, leaving trea
+sures [ 1 2 4 8 ]

----->   Calls: 7     Depth: 5     Share: [ 2 3 ]    Target: 0     Tre
+asures: [ 1 2 4 8 ]      ... Returning "[]" because target == 0

A recursive call at depth 5 found that a solution of [ 3 ] adds up to 
+my target of 3 at depth 4

A recursive call at depth 3 found that a solution of [ 2 3 ] adds up t
+o my target of 5 at depth 2

Solution: 2 3
[download]

And Now My Approach to the Partition Problem

This code makes vastly more sense to me, and reflects the way my brain thinks, to a certain degree. I wish to continue meditating on MJD's code until I can understand it on a much better level, even to the point that I understand this code below:

#!/usr/bin/perl

use strict;
use warnings;

my ( $target, @treasures, $calls );

$target = 5;
@treasures = ( 50, 2, 4, 3, 1, 2, 4, 8 );

print "Looking for numbers that add up to target: $target...\n";

print <<__OUT__;
Solution: @{ try_treasures( \@treasures, $target ) || 'none?' }
Calls: $calls
__OUT__

sub try_treasures
{
   my ( $treasures, $target ) = @_;

   my @legit_treasures = grep { $_ <= $target } @$treasures; # save so
+me cycles

   # try each number, with all others
   for ( my $i = 0; $i < @legit_treasures; $i++ )
   {
      my $add_this  = $legit_treasures[ $i ];
      my @to_these  = @legit_treasures;

      splice @to_these, $i, 1; # every number except the one we're wor
+king with

      my $attempt = try_bucket( [], $target, [ $add_this, @to_these ] 
+);

      return $attempt if $attempt; # when everything adds up perfectly
+ in sequence

      for ( my $j = 0; $j < @to_these; $j++ ) # ...when everything doe
+s not
      {
         my @try_without = @to_these;

         # start trying combinations of numbers while sequentially omi
+tting
         # ones that didn't work before
         splice @try_without, $j, 1;

         my $attempt = try_bucket( [], $target, [ $add_this, @try_with
+out ] );

         return $attempt if $attempt;
      }
   }
}

sub try_bucket
{
   $calls++;

   my ( $bucket, $target, $choices ) = @_;

   return undef unless @$choices;

   push @$bucket, shift @$choices;

   # calculate sum of all numbers in the bucket, unless there's only o
+ne
   my $bucket_sum = @$bucket == 1 ? $bucket->[0] : add_these( @$bucket
+ );

   if ( $bucket_sum == $target )
   {
      return $bucket;
   }
   elsif ( $bucket_sum < $target )
   {
      return try_bucket( $bucket, $target, $choices );
   }
}

sub add_these
{
   my @add_these = @_;
   my $total = 0;

   print qq(Adding up [ @{[ join ' + ', @add_these ]} ] = );

   for my $this_one ( @add_these ) { $total += $this_one }

   print qq{$total\n};

   return $total;
}
[download]

...And its output as well (not as complicated)

Looking for numbers that add up to target: 5...
Adding up [ 2 + 4 ] = 6
Adding up [ 2 + 3 ] = 5
Solution: 2 3
Calls: 4
[download]

--
Tommy
$ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'

Comment on Mark Jason Dominus And Me - The Partition Problem Select or Download Code

Replies are listed 'Best First'.
Re: Mark Jason Dominus And Me - The Partition Problem by tilly (Archbishop) on Nov 13, 2012 at 06:48 UTC
In your code, try modifying `$target` to `12` and `@treasures` to `qw(1 2 4 8 16 32)`. The result is that it fails to figure out that the answer is `4 + 8`. (And then trips another bug on the way to reporting its wrong answer.) This is not an incidental mistake. The fundamental problem in your approach is that you've hardcoded a finite set of patterns of inclusion/exclusion patterns that you're willing to try. With that approach the question is when, not if, your solution will break. In this case it was a pattern of inclusion that went "no, no, yes, yes" that broke. But patch that and you'll have another hole in your logic. You can't incrementally fix your approach to get a working solution. Please don't try. If you disregard that advice, then change `@treasures` to `qw(1 2 4 8 16 32 64 128 256 512)` and try to generate every `$target` in the range `0..1023`. For any reasonable amount of work you can do with your solution, you'll find a bug with that test. MJD's solution, by contrast, works perfectly. How does he do that? MJD uses recursion. As the joke goes, To understand recursion, first you must understand recursion. The joke has truth to it. The trick is that to see how recursion works for a problem of given size, you only need to understand how recursion works for all problems of smaller size. To understand, you start with examples of the smallest possible size and works your way up. For 0 treasures, MJD's algorithm tries out `[]` (that's the check for `$target == 0`) then gives up. For 1 treasures, MJD's algorithm tries out `[]`. does sanity checks, makes its first recursive call and tries out `[A]`. then makes a second recursive call and tries out `[]` again (silly, isn't it?), then gives up. For 2 treasures, MJD's algorithm tries out `[]`, does sanity checks, then makes its first recursive call inside of which it may try out `[A], [A, B], [A]` then makes a second recursive call inside of which it may try out `[], [B], []`. Basically "empty set", followed by "all subsets with A", followed by "all subsets without A". (With some duplication due to the empty set being checked in the sanity check, then again in the second recursive call.) For 3 or more treasures, the pattern repeats. We test the empty set. Then all subsets with the first element in the first recursive call. Then all subsets without the first element in the second recursive call (and the empty set multiple times). There is some redundancy in the checks, but we're guaranteed to actually try every subset. If the answer is to be found, we'll find it! (And if not, then we tried really, really hard.) Hopefully this has helped you figure this case out. But that raises an interesting question. If you got through the Fibonacci and directory walk examples then you obviously understand the principle of recursion. Why would you balk on this example? My most likely guess is that you try to visualize what is happening. In the case of Fibonacci it is easy enough to verify, number by number, what is happening. In the case of a directory walk, it is easy enough to visualize what it is doing, and see how it works. But in the case of the partition problem, what we're working with is subsets of sets. That's pretty abstract. When things jump in abstraction, it is easy for our brains to get lost. Once our brains get lost, we lose track and get confused. If that is the case for you, then focus on the concept of the size of the problem you are trying to solve. As long as all of the base cases are handled, and the recursive calls always are to "simpler" problems, the recursive algorithm will work. If the recursive calls are not to simpler problems, then you can get caught in infinite recursion.	[reply] [d/l] [select]
Re^2: Mark Jason Dominus And Me - The Partition Problem by Tommy (Chaplain) on Nov 13, 2012 at 13:33 UTC
Wow, thanks tilly. I really appreciate your taking time to look at my meditation and read my code. I'd like to respond to a few of the things you said, and mostly in the form of questions (if you don't mind) You said: "If you got through the Fibonacci and directory walk examples then you obviously understand the principle of recursion. Why would you balk on this example? My most likely guess is that you try to visualize what is happening. In the case of Fibonacci it is easy enough to verify, number by number, what is happening. In the case of a directory walk, it is easy enough to visualize what it is doing, and see how it works." YES, exactly. I can confidently say that I know what recursion is, and more importantly I'd like to believe that I "understand" recursion; the directory walk is a piece of cake (I've written very similar code in the past), and the Fibonacci exercise was also very easy to grasp. I could easily see where/why the code recursed and when/what it returned. The two things I don't understand: 1.) Why the ambiguous output? My confusion lies in the final 3 lines of the output of my (more verbose) version of the MJD recursive solution, as shown in the original post. I don't understand why something that appears to be a partial answer gets passed back up the stack and marked as a "solution", per the "$solution" variable name taken from the MJD code. When the code claims to have found a solution the first time, the solution it finds is not the real solution to the problem. Only higher on up the stack is the real solution identified. The first claim to have found a solution doesn't ring true. Sub-question: What made this solution work then? 2.) When is recursion the right thing to do? What criteria make recursion the method of choice for approaching problems as opposed to iteration? For a predefined tree structure, the answer is obvious, however I don't see why it is the only way to accomplish the solution of the Partition Problem which has no pre-defined tree that I can see. I don't doubt your assertion that recursion is the correct methodology for finding a solution to the Partition Problem, but could you tell me why? How would I recognize that I needed recursion there? I had to jot this down quickly before taking the kids to school and going to $work, so please forgive me if there's any muddled grammar or disjointed sentences... -- Tommy `$ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'`	[reply] [d/l]
Re^3: Mark Jason Dominus And Me - The Partition Problem by tilly (Archbishop) on Nov 13, 2012 at 14:56 UTC
For 1 the trick is in the first recursive call. When we subtract the current treasure from the target, we want to find solutions that include the current treasure. However inside that call, we don't know about the existence of that current treasure on the inner call - that fact is implicit in the fact that the target is smaller, and it will only get added back to the solution after the inner call returns. For 2 the answer is, "Whenever we have a problem that we don't know how to solve, which can be somehow reduced to simpler versions of the same problem." For Fibonacci, that means smaller $n. In the case of the directory walk, that means farther down the directory tree. In the case of the partition problem, simpler means "fewer treasures".	[reply]
Re^4: Mark Jason Dominus And Me - The Partition Problem by Tommy (Chaplain) on Nov 19, 2012 at 22:25 UTC
Re^5: Mark Jason Dominus And Me - The Partition Problem by Tommy (Chaplain) on Nov 19, 2012 at 23:27 UTC
Re^3: Mark Jason Dominus And Me - The Partition Problem by Laurent_R (Canon) on Nov 13, 2012 at 22:15 UTC
Just an additional point. When you'll go further down into MJD's brilliant Higher Order Perl, you'll find out that he warns about problems about recursion (paragraph 1.8, "Where Recursion Blows"), in which he revisits the Fibonacci and Partition problems for larger Fibonacci numbers or larger collections of treasures) and that, later in the book, he is spending quite a bit of time on methods to get rid of recursion (especially Chapter 5, "From Recursion To Iterators, but not only there). Recursion is a beautiful model not only when you have a predefined tree structure (such as the directory walk example), but also when a problem is very complicated, you often need to reduce it to a collection of smaller or simpler problems, and possibly again and again, until the individual problems become (almost) trivial (this is what divide and conquer algorithms are about). As an example, some of the efficient sort algorithms just do that, although their actual implementation may well have no recursive call at all. You may also want to use recursion if you want to generate all subsets of a set, or combinations or permutations. This may seem to be maths, but it is quite commonly necessary in software engineering. Many of the Operational Research problems also often fall into the category of candidates for recursion. In brief, recursion gives you a very powerful tool to modeling complex problems, especially when you want to make sure that you have examined absolutely all possibilities. The downside is that, sometimes, recursion will let you visit each possibility a number of times, possibly a huge number of times (there are some ways around that to a certain extent, one of them being caching, which MJD examines in some depth, but not always). So, sometimes you use recursion to better understand the problem and how to solve it, but then change it to iteration or something else when is comes to actual implementation. In such cases, recursion can be thought of as a prototype. Reading further MJD's book, you will also find out that even for directory walk, recursion is not necessarily the best solution, the reason being that recursion is very good at depth-first traversing of the directory tree; but sometimes you need to traverse the tree in breadth-first order (or some other order), and then, recursion becomes very unpractical to use. Keep reading this excellent book. I really did not have any problem with recursion when I first read it, but I learned an awful lot of other things from it, and there are other with which I am still fighting after having read them three times. There are some chapters that I will probably read for the fourth time, because I haven't fully understood their full implications and I still have quite a but to learn for them.	[reply]
Re^4: Mark Jason Dominus And Me - The Partition Problem by Dominus (Parson) on Nov 14, 2012 at 18:12 UTC
Re^5: Mark Jason Dominus And Me - The Partition Problem by Laurent_R (Canon) on Nov 17, 2012 at 15:47 UTC
Some notes below your chosen depth have not been shown here
Re^4: Mark Jason Dominus And Me - The Partition Problem by tilly (Archbishop) on Nov 14, 2012 at 02:43 UTC
Re^3: Mark Jason Dominus And Me - The Partition Problem by Anonymous Monk on Nov 13, 2012 at 14:18 UTC
could you tell me why? How would I recognize that I needed recursion there? If you have a set, that you need to divide, you you probably need recursion See Recursion (computer science) and Divide and conquer algorithm	[reply]
Re: Mark Jason Dominus And Me - The Partition Problem by Anonymous Monk on Mar 12, 2014 at 14:10 UTC
The graphics on recursion in HOP book help a lot to visualize how recursion works in the code. Take a binary tree as a matter of example for representing the recursive solution to the problem. In a recursive procedure the flow goes from the most left-side branch to the most left-side leave all the way down. In the iterative procedure, the flow goes top-down, horizontal layer of nodes after horizontal layer of nodes When you combine both procedures (recursion + iteration), Recursion takes priority over iteration. That means if recursion takes places when the loop variable is set to, for instance 1, at the first level of your tree, then iteration is "interrupted", recursion goes all the way down till all loops of the nested branched nodes are resumed, then the flow returns back from bottom to top (through the parent nodes) till reaches the first level again and goes on subsequently with the next value of the loop variable (set from 1 to 2) and so on and so forth till the first loop of the first level is accomplished.	[reply]