Puzzle: The Ham Cheese Sandwich cut.

Replies are listed 'Best First'.
Re: Puzzle: The Ham Cheese Sandwich cut. by ambrus (Abbot) on Nov 17, 2005 at 17:57 UTC
The warm-up problem isn't so trivial. Two possible solutions are decribed in Cormen – Leiserson – Rivest – Stein: Introduction to Algorithms. One of these is randomized and runs in expected O(n) time. (Update: the other one runs in guaranteed O(n) time, as Perl Mouse has noted in his reply. I'm sorry this wasn't clear from my original post.) I've recently implemented this randomized algorithm for perl, although my implementation is not a very efficent one, as it would be possible to do all its operations in place (with only O(n) extra memory and more importantly less time). The rest of this post shows my implementation. Read more... (2 kB)	[reply] [d/l] [select]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 18, 2005 at 10:12 UTC
One of these is randomized and runs in expected O(n) time. The rest of this post shows my implementation. Nice, but the worst case running time is `Ω(n²)`. It suffers from the same problem as Quicksort: picking a random pivot works well often enough to get a good expected running time, but if you're unlucky, it's really slow. There is an algorithm to do it in garanteed linear time (although when done in Perl, the constants are so high that for most practical situations, one can better use sorting in C and picking the middle element). `Perl --((8:>*`	[reply]
Re: Puzzle: The Ham Cheese Sandwich cut. by Limbic~Region (Chancellor) on Nov 17, 2005 at 14:25 UTC
Perl Mouse, I haven't started working on any of the challenges yet, because I wanted to raise a question first. When I learned about means, modes, and medians in statistics - I thought I remembered learnING that the median of an even list is the average of the two middle numbers. `1, 2, 3, 4, 5, 6 = 3, 4 = 3 + 4 / 2 = 3.5` [download] Is that correct? I guess it doesn't matter if it is since the line that bisects the two lists will still be the median. Cheers - L~R After posting the question, I realized that the answer doesn't matter. shrug	[reply] [d/l]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 17, 2005 at 15:03 UTC
It depends how you look at the problem. If you look at it as the 1-d variant of the "divide sets using a simplex" problem, any number between the two middle numbers will do. However, if you want to write a Quick Sort whose running time is garanteed to be `O(N log N)`, you need to find a median in linear time, and you want to find an element of the set - not something in between. Wether you find one of the middle elements, or pick a number in between, I'll accept both solutions. `;-)`. `Perl --((8:>*`	[reply]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by ambrus (Abbot) on Nov 19, 2005 at 18:10 UTC
Are you sure in this? Once you have found a number so that exactly half of the numbers are to the left and half are to the right, couldn't you separate these two classes of numbers, sort them separately, and still get an O(N log N) time sort this way? If I have to define median, I'd say that if you have an even number of data, any number between the two middle one is a median. This way the definition is equivalent then if you say that the median is a number whose total distance from the given numbers is minimal. This latter definition has paraleles: the mean is the number for which the square sum of its distance from the given numbers is minimal. More clearly, given the sequence (x_1, ..., x_N), the mean is the number A that minimizes the expression \|x_1 - A\|^2 + ... + \|x_N - A\|^2; the median is M if it minimizes \|x_1 - M\| + ... + \|x_N - M\|. Furthermore, informally speaking, the modus C minimizes \|x_1 - C\|^epsilon + ... + \|x_N - C\|^epsilon, where epsilon is a very small positive number.	[reply]
Re^4: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 20, 2005 at 16:21 UTC
Re: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 17, 2005 at 19:56 UTC
Observation 1 (or maybe it's just obvious): For even numbers of points, there may be more than one correct answer (even aside from trivially jittering the dividing line back and forth a little). Example: two red points (0,0),(1,1) and two green points (1,0),(0,1). Plotting them: `g r r g` [download] They can be divided vertically or horizontally. Declaring an odd number of points of each color and requiring that no two points be at the same spot may force a unique solution, but I'm not sure. Wow! This is a tough problem! Update: Turns out there are at least some graphs with an odd number of each color of node and where there are multiple correct answers. Example: Read more... (906 Bytes) Update: I have an n^2 algorithm (not implemented yet, but it works).	[reply] [d/l] [select]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 18, 2005 at 10:05 UTC
For even numbers of points, there may be more than one correct answer Indeed. That's why the puzzle says returns a line, and not returns the* line. `Perl --((8:>`	[reply]
Re: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 17, 2005 at 20:29 UTC
Is this a "tree thing"? You insert the points into a (Red-Black?) tree and they effectively sort themselves into the two required groups either side of the median which is ends up as the root node? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 17, 2005 at 20:42 UTC
I really doubt it because any point can have a median drawn through it. Remember, the median can go in any direction. For any point, you can find some angle to draw a line through that point that will separate all other points of that color into two equally-sized groups. So if it's a tree structure, it's not a standard one where divisions are made parallel to the axis of the graph.	[reply]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 17, 2005 at 21:37 UTC
I've not yet convinced myself that this is soluble in the general case. In the 2D case, if all the points in both groups have one coordinate in common, and there are an odd number of points in each group or in the more general case of all the points lying on a straight line at any arbitrary angle.: `+-----------+ +-----------+ +-----------+ \| . \| \| \| \| . \| \| x \| \| \| \| . \| \| x \| \| \| \| . \| \| . \| \|.xx . x . \| \| x \| \| . \| \| \| \| x \| \| x \| \| \| \| x \| +-----------+ +-----------+ +-----------+` [download] Unless you consider the line passing through all the points satisfies the criteria of having an equal number of each type of point on either side; ie. none? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 17, 2005 at 21:52 UTC
Re^5: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 17, 2005 at 22:04 UTC
Some notes below your chosen depth have not been shown here
Re^2: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 18, 2005 at 04:42 UTC
(1b) I have an n*lg(n) solution for 2-D. Sorry, BrowserUK: I was SO wrong in saying no tree. My solution (which I have not coded) uses a PR quadtree. My solution is certainly not the only approach that will work. (1c) I have no idea (yet) how this might be brought up to O(n), nor do I have a clue (yet) how to prove it impossible. (2a) Also, while a PR quadtree can be used in 3D, my approach does not extend easily to 3D. More thought required. Question: For a puzzle, ought I post pseudocode/code on completion, or is it polite to instead leave the solution unposted, giving others the fun of solving it? Keep chuggin', guys! Nothing quite so rewarding as solving a tough puzzle!	[reply]
Re: Puzzle: The Ham Cheese Sandwich cut. by Anonymous Monk on Nov 17, 2005 at 19:02 UTC
I assume we're supposed to take constant time comparison as an axiom? `use Math::BigInt; use Benchmark qw( cmpthese ) ; my $x = (new Math::BigInt 2)(216); my $y = $x - 1; my $m = new Math::BigInt 2; my $n = $m + 1; cmpthese(-1, { large => sub{ $y < $x }, small => sub{ $m < $n } });` [download]	[reply] [d/l]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 18, 2005 at 10:06 UTC
I assume we're supposed to take constant time comparison as an axiom? Yes. `Perl --((8:>*`	[reply]
Re: Puzzle: The Ham Cheese Sandwich cut. by robin (Chaplain) on Nov 21, 2005 at 19:08 UTC
As ambrus said, even the warm-up problem is pretty damned hard. I too cheated by looking in Cormen, Leiserson and Rivest. Here is a Perl implementation of the linear-time algorithm they give. sub naive_median { (sort {$a <=> $b} @_)[@_/2]; } sub nth_largest { my ($n, @a) = @_; die "You can't find the ${n}th-largest element of an ".@a."-element +array!" if $n > $#a \|\| $n < 0; #warn "Looking for ${n}th element of (@a)\n"; return $a[0] if $n == 0; my @medians; for(my $i=0; $i < @a; $i += 5) { push @medians, naive_median(@a[$i..($i+4 > $#a ? $#a : $i+4)]); } my $median = median(@medians); my @smaller = grep {$_ < $median} @a; return nth_largest($n, @smaller) if $n < @smaller; my @larger = grep {$_ >= $median} @a; return nth_largest($n - @smaller, @larger); } sub median { unshift @_, int(@_/2); goto &nth_largest; } [download] In practice it's pretty inefficient, and even proving that it runs in linear time is not entirely trivial!	[reply] [d/l]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Roy Johnson (Monsignor) on Nov 21, 2005 at 19:17 UTC
That `goto` is not helpful. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by robin (Chaplain) on Nov 21, 2005 at 19:28 UTC
Hmm, that's interesting. But I bet the goto is faster if `@_` has, say a million elements. You're saving an awful lot of copying. Update: I lost this bet :-) It also saves a fair amount of stack space.	[reply]
Re^4: Puzzle: The Ham Cheese Sandwich cut. by Roy Johnson (Monsignor) on Nov 21, 2005 at 19:31 UTC
Re^5: Puzzle: The Ham Cheese Sandwich cut. by robin (Chaplain) on Nov 21, 2005 at 19:39 UTC
Re^2: Puzzle: The Ham Cheese Sandwich cut. by hv (Prior) on Nov 22, 2005 at 13:19 UTC
I suspect that a proof of the running time order will concentrate on the expected depth of recursion. However I believe it will be much harder to prove that the `push` is O(1) - indeed I suspect it is not - and without that the algorithm as a whole cannot be O(n). Hugo	[reply] [d/l]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 22, 2005 at 14:05 UTC
No, the proof doesn't need an expected running time. The running time `T(N)` is expressed as: `T(N) = T(N/5) + T(7N/10 + 10) + Ο(N);` which has `T(N) = Ο(N)` as a solution. However I believe it will be much harder to prove that the push is O(1) - indeed I suspect it is not - and without that the algorithm as a whole cannot be O(n). It doesn't have to be. What's needed is that the push has an amortized running time of `Ο(1)` - that is, if we perform `N` pushes, the total running time is still bounded by `Ο(N)`. And from what I understand of how allocation of array sizes work (an addition extra 20% memory is being claimed), a push has an amortized `Ο(1)` performance. A single push may take `Θ(N)` running time, but `N` pushes average it out. `Perl --((8:>*`	[reply]
Re: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 23, 2005 at 04:56 UTC
If I calculate the median point of both datasets, (using the minimised Euclidian distance method, 2D for now), I get two points, one for each set of colors. These rarely match up with any of the given points. If I project the line through those two points, it appears to divide the dataset as required. Is this the correct approach? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 23, 2005 at 05:33 UTC
I think you're suggesting take four medians: the median of the X coordinates of the reds, the median of the Y coordinates of the reds, and the same for the greens. I don't understand what you might be doing with minimised Euclidean distance, though. Mind explaining? Then maybe I can tell you if it's on track with my aproach (which is NOT yet O(n)).	[reply]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 23, 2005 at 06:37 UTC
I think you're suggesting take four medians: the median of the X coordinates of the reds, the median of the Y coordinates of the reds, and the same for the greens. No. The problem is defining (or understanding) what the median is for a 2D dataset (R²). Think of 3 points in the form of an equilateral triangle with the lower edge parallel to the X axis. `+ \| x \| . . \| . . \| . . \| . . \| . . \| x.............x +-------------------` [download] Whilst the top point is the median in the X axis (looking up). The bottom right point is the median if you are looking in from the top left. Equally it's the bottom left point, if you look in from top right. Which would be the "correct median" depends upon the relative positioning of the other set of three points; or more correctly, their median. And the above three points can be rotated through 0->120°, giving an infinite number of directions to view the dataset, (or transformations you could apply), in order to access the median. Which I think means that the warm-up problem is an almost complete red herring! As you cannot work out which direction to look in (or which transformation of the coordinate system to apply), to determine the median for this dataset, until you know the median of the other. And vice versa. You cannot use a 'sort and take the middle' or K'th ordered element approach to determining the median as you would use for an R¹ dataset; for an R² dataset. Nor for the higher dimensions. That leads you, (led me?), to think about how to determine the median of a set of points in R², without reference to the other dataset. And that's when I found the Euclidian distance method. The premise is that the median of a R² dataset is that point at which the sum of the Euclidian distances between that point and the points in the datast is minimised. There are other methods, including the point that minimises the sum of the areas of the sets of triangles formed between that point and pairs of points of the dataset, but that seems much harder to calculate. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Puzzle: The Ham Cheese Sandwich cut. by jeffguy (Sexton) on Nov 23, 2005 at 13:44 UTC
Re^4: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 23, 2005 at 09:51 UTC
Re^5: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 23, 2005 at 10:17 UTC
Re: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 21, 2005 at 04:01 UTC
Megiddo Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Puzzle: The Ham Cheese Sandwich cut. by Perl Mouse (Chaplain) on Nov 21, 2005 at 10:23 UTC
Indeed. `Perl --((8:>*`	[reply]
Re^3: Puzzle: The Ham Cheese Sandwich cut. by BrowserUk (Patriarch) on Nov 21, 2005 at 14:39 UTC
I don't think I'll be tackling the problem further. I just waded through the 1983 paper, and it'd take me a month of Sunday's to translate it into something I could attempt to produce code from. Geez. Haven't these guys ever heard of 'worked examples', or that a picture paints a thousand words :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Puzzle: The Ham Cheese Sandwich cut. by tphyahoo (Vicar) on Nov 21, 2005 at 07:53 UTC
1, was the solution to this ever posted? It seems like in the exchange with ambrus there was a pretty strong hint to the solution for challenge 1, but as to everything else - huh? 2, for future reference, a pretty unintimidating guide to sorting (just some class notes) is at a sorting I'm kind of reading up on that during breaks. I take it that sorting is at the core of the problem space here, but maybe I'm wrong on that. Anyway, thanks for posting an interesting problem. A solution would be nice though :)	[reply]