Limbic~Region has asked for the
wisdom of the Perl Monks concerning the following question:
All,
Yesterday, I grumbled in the CB that there had been a lack of challenging problems that were interesting lately. ikegami was kind enough to suggest sorting the sums of two sorted lists. For instance:
1 2
3 4
5 6
7 8
3, 5, 5, 7, 7, 7, 9, 9, 9, 9, 11, 11, 11, 13, 13, 15
Now obviously, there is nothing challenging about this. Perform a nested loop to collect all sums and then sort the remaining list. The challenge then is that memory consumption needs to be less than N + M + N*M where N and M represent the number of elements in the respective lists. Of course you could cheat by putting things out on disk or what not, but that really isn't in the spirit of the challenge. So  what is the least amount of memory you can use above N + M? Take it as a given that you can store both lists in memory and that you will always know which of the two lists has fewer elements if not the same.
I have 3 solutions where I have whittled the memory consumption down. The best only requires N memory above N + M where N represents the smaller of the two lists. I haven't Benchmarked it against the others but there are virtual bonus points for being both memory efficient and fast.
Re: Challenge: Sorting Sums Of Sorted Series
by BrowserUk (Pope) on Feb 02, 2010 at 17:43 UTC

The fact that you have 16 results but only require 12 units of memory means that you only have to output the values ordered, not store the ordered results set?
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] 

 [reply] 
Re: Challenge: Sorting Sums Of Sorted Series
by blokhead (Monsignor) on Feb 02, 2010 at 22:46 UTC

You can always save space by never storing anything  just recompute a sum every time you need it.. Here's a solution that uses constant space:
sub print_sums {
my ($listA, $listB) = @_;
my $min = $listA>[0] + $listB>[0]  1;
while ($min < $listA>[1] + $listB>[1]) {
my $nextmin = undef;
my $multiplicity = 0;
for my $i (0 .. $#$listA) {
for my $j (0 .. $#$listB) {
my $sum = $listA>[$i] + $listB>[$j];
if ($sum > $min and ($sum < $nextmin or not defined $nextm
+in)) {
($nextmin, $multiplicity) = ($sum, 1);
} elsif ($sum == $nextmin) {
$multiplicity++;
}
}
}
print( ($nextmin) x $multiplicity);
$min = $nextmin;
}
}
At each iteration, it traverses both lists to find the next smallest possible sum, and how many times it occurs. It only needs to keep track of $min, $nextmin, $multiplicity, $i, $j.
Of course the tradeoff is the running time, which is O((NM)^{2}).
Perhaps the metric should be to minimize the product of time space complexities? For comparison, naively computing all sums and sorting uses NM space and NM log(NM) time, so it's slightly worse than mine under the combined metric.
Update: extra parens around print statement..
 [reply] [d/l] 

 [reply] 

 [reply] [d/l] 


Re: Challenge: Sorting Sums Of Sorted Series
by Krambambuli (Curate) on Feb 03, 2010 at 08:12 UTC

Can't help, have to ask this question:
how would an algorithm that would deal with this problem for infinite (huge...) lists look like...?
Update: And, given a finite part of each serie, a_{1}, ..., a_{N}, b_{1}, ..., b_{M}, how many  which  terms of the result could be trustfully computed (knowing that there are other terms to follow) ?
And how far could we advance with the 'arrival' of a_{N+1} and/or b_{M+1}...?
hmmmm... Is someone aware of some good readings in order to get the answers ?
Thanks,
 [reply] 

Krambambuli,
Before answering your question, keep in mind I already allowed for both lists to fit in memory as a given. In reality, if you had a really large list you probably wouldn't be going through these contortions  you would probably sort on disk.
Here are a few links to help your main question
I guess your question boils down to: How can you do this with even less than 2N + M? blokhead has shown a solution that uses no memory but takes (N*M)^2 to run. TO give you a specific answer, one would need a specific problem. As I demonstrated in How many words does it take?, the trick to solving really hard problems in the general case (NP complete) is to exploit the details of the specific case (effectively turned into O(1)).
 [reply] 

Just some ideas on how you can answer your questions:
For representing infinite lists I recommend you to read Mark Jason Dominus' excellent book named Higher Order Perl, especially the chapter on infinite lists (where he is using closures for iterators and promiseforcing streams). It is available online in pdf format: http://hop.perl.plover.com.
And for your updated question: given the finite parts (a1, ..., aN), (b1, ..., bM), if you take min(a1+bM, b1+aN), you are safe to print any sum less than or equal to this threshold as a beginning sequence of the final list. Anything above this threshold cannot be considered final.
update: First I've written this: min(a1, b1) + min(aN, bM), but this is wrong, corrected above. And minor grammatical edits, arrgh...
 [reply] [d/l] [select] 
Re: Challenge: Sorting Sums Of Sorted Series
by Limbic~Region (Chancellor) on Feb 03, 2010 at 13:26 UTC

All,
My 3 solutions are below. The first two were not created with the memory challenge in mind. In the first, I was trying to show ikegami that if you turned the problem on its side, a merge algorithm would work.
This uses the general purpose merge algorithm iterator I created at Merge Algorithm (Combining Sorted Lists). I discovered several bugs 3 years later but they are fixed in the version below.
In this version, I realized that the iterators were not needed. They really didn't reduce complexity and there is obviously the performance penalty of invoking subs repeatedly.
#!/usr/bin/perl
use constant VAL => 0;
use constant IDX => 1;
use constant SUM => 2;
use strict;
use warnings;
my @list1 = (1 .. 5);
my @list2 = (1 .. 5);
my @merge_list = map {[$_, 0, $_ + $list2[0]]} @list1; # Map over smal
+ler list if known
while (@merge_list) {
my ($min, $index) = ($merge_list[0][SUM], 0);
for (1 .. $#merge_list) {
($min, $index) = ($merge_list[$_][SUM], $_) if $merge_list[$_]
+[SUM] < $min;
}
print "$min\n";
$merge_list[$index][IDX]++;
if ($merge_list[$index][IDX] > $#list2) {
splice(@merge_list, $index, 1);
}
else {
$merge_list[$index][SUM] = $merge_list[$index][VAL] + $list2[$
+merge_list[$index][IDX]];
}
}
This is where I realized it could be turned into a challenge since only 1 of the 3 tracking values was necessary. Caching the sum improves performance but what are a few cycles amongst friends. I also realized that keeping track of the value in @list1 was also just a caching technique. With it gone, I was left with 2N + M along with about 4 constant scalars.
#!/usr/bin/perl
use strict;
use warnings;
my @list1 = 1 .. 5;
my @list2 = 1 .. 5;
my @merge_list = ((0) x scalar @list1); # use smaller of 2 lists
while (1) {
my ($min, $idx);
for (my $i = 0; $i < @merge_list; ++$i) {
next if $merge_list[$i] > $#list2;
my $sum = $list1[$i] + $list2[$merge_list[$i]];
($min, $idx) = ($sum, $i) if ! defined $min  $sum < $min;
}
last if ! defined $min;
print "$min\n";
$merge_list[$idx]++;
}
 [reply] [d/l] [select] 
Re: Challenge: Sorting Sums Of Sorted Series
by Ratazong (Monsignor) on Feb 03, 2010 at 13:45 UTC

Hi Monks
The following algorithm seems to need 4+N+M memory ... and therefore fulfills the challenge :)
 val = N1 + M1
 loop
 for all i, j: if (Ni+Mj == val) print (Ni+Mj)
 nextval = val
 for all i, j: if ((Ni+Mj)>val) && ((Ni+Mj) < nextval) => nextval = (Ni + Mj)
 if (val == nextval) => exit # we are finished
 val = nextval
 goto 2.
So in additon to N and M, you need the constant space for 4 variables (i, j, val, nextval) and for the temporary sums. :)
However the runtime would be O(N^2*M^2) (*)  so I wouldn't consider it efficient ;)
Rata
(*) 2 * N * M for the inner loop  and a maximum N * M for the outer one
note: the restriction that N and M are sorted can easily be dropped with the algorithm above (just replace line 1 with lines 2.2.2.5)
update: Limbic~Region just told me that blokhead posted nearly the same solution before (he needs one variable more, which doesn't affect the complexity O((NM)^2). ... I didn't want to waste bandwidth, but I fear duplicated solutions are the risk if one is not cheating ;)
 [reply] 
Re: Challenge: Sorting Sums Of Sorted Series
by kennethk (Abbot) on Feb 03, 2010 at 16:23 UTC

Have I got a solution for you, and some benchmarking to back it up. Since we know that any given Ai,j (where Ai,j = Ni + Mj) will be larger than Ai1,j or Ai,j1, we can establish a queue of values to be output that is the length of the shorter of the two lists and perform a sorted insertion on that list for each new value. This solution meets the 2N+M criterion and, unless I'm missing something, should run in N^2*M (worst case), where an initial test guarantees N is the smaller list. What follows is a benchmarking script, including both of Limbic~Region's and blokhead's posted solutions, a naive baseline sort with N + M + N*M memory, and a terrible solution that works only on integers I cobbled together yesterday, similar in many ways to blockhead's but with terrible performance when max(output)  min(output) is large.
For 100element lists, I got:
Benchmark: timing 100 iterations of Baseline, Integers, LR_1, LR_2, Qu
+eue, blokhead...
Baseline: 0.554613 wallclock secs ( 0.54 usr + 0.01 sys = 0.55 CPU
+) @ 181.82/s (n=100)
(warning: too few iterations for a reliable count)
Integers: 32.3786 wallclock secs (32.37 usr + 0.00 sys = 32.37 CPU)
+ @ 3.09/s (n=100)
LR_1: 18.7755 wallclock secs (18.77 usr + 0.00 sys = 18.77 CPU)
+ @ 5.33/s (n=100)
LR_2: 69.7311 wallclock secs (69.73 usr + 0.00 sys = 69.73 CPU)
+ @ 1.43/s (n=100)
Queue: 2.55908 wallclock secs ( 2.55 usr + 0.00 sys = 2.55 CPU)
+ @ 39.22/s (n=100)
blokhead: 95.6313 wallclock secs (95.62 usr + 0.01 sys = 95.63 CPU)
+ @ 1.05/s (n=100)
Rate blokhead LR_2 Integers LR_1 Queue Baseline
blokhead 1.05/s  27% 66% 80% 97% 99%
LR_2 1.43/s 37%  54% 73% 96% 99%
Integers 3.09/s 195% 115%  42% 92% 98%
LR_1 5.33/s 409% 271% 72%  86% 97%
Queue 39.2/s 3650% 2635% 1169% 636%  78%
Baseline 182/s 17287% 12578% 5785% 3313% 364% 
And for the 4 element lists in the original post:
Benchmark: timing 100000 iterations of Baseline, Integers, LR_1, LR_2,
+ Queue, blokhead...
Baseline: 1.62766 wallclock secs ( 1.63 usr + 0.00 sys = 1.63 CPU)
+ @ 61349.69/s (n=100000)
Integers: 7.29155 wallclock secs ( 7.28 usr + 0.01 sys = 7.29 CPU)
+ @ 13717.42/s (n=100000)
LR_1: 7.017 wallclock secs ( 7.02 usr + 0.00 sys = 7.02 CPU) @
+ 14245.01/s (n=100000)
LR_2: 7.79621 wallclock secs ( 7.80 usr + 0.00 sys = 7.80 CPU)
+ @ 12820.51/s (n=100000)
Queue: 3.60201 wallclock secs ( 3.60 usr + 0.00 sys = 3.60 CPU)
+ @ 27777.78/s (n=100000)
blokhead: 9.88667 wallclock secs ( 9.89 usr + 0.00 sys = 9.89 CPU)
+ @ 10111.22/s (n=100000)
Rate blokhead LR_2 Integers LR_1 Queue Baseline
blokhead 10111/s  21% 26% 29% 64% 84%
LR_2 12821/s 27%  7% 10% 54% 79%
Integers 13717/s 36% 7%  4% 51% 78%
LR_1 14245/s 41% 11% 4%  49% 77%
Queue 27778/s 175% 117% 103% 95%  55%
Baseline 61350/s 507% 379% 347% 331% 121% 
Update: Found and fixed bug with queue method  I swear I tested it. Following that, I updated the benchmarking results with the correct values  queue down a little, but it still well better than other options.
Update: Gah, found bug in algorithm  essentially as the array of results grows large, order was upset because @queue was not large enough. After a little effort, I determined the minimum safe value for @queue, assuming the same transit order as I have been using, is 1/2*N*MN  for the pathological case, you must store almost the entire lower triangle. While this does meet the specified criteria in a literal sense, I assume the intent was to avoid O(N*M) scaling and thus this fails. I tried a couple of different traversal directions, but came up with zilch. Maybe something will come to me in my dreams...  [reply] [d/l] [select] 

kennethk,
I haven't digested your code but at a glance, your Benchmark is flawed for a couple of reasons.
 They don't do the same amount of IO (solution_1 and baseline do fewer prints)
 solution_1 doesn't produce the correct output
Here is the output of your code with the following lists
 [reply] [d/l] 

As per update, fixed the bug in solution_1, with little modification to performance improvement. Regarding the contention that baseline and solution_1 are doing fewer prints, of course they are. If IO is a bottle neck, then a method that can reduce calls to IO is a better method. But, if you want to eliminate those benefits of the algorithms, benchmarking with subroutines including lines like
print OUT "$_\n" foreach @list;
result in the following comparisons, where _pr specifies a version with more print statements:
Benchmark: timing 100000 iterations of Baseline, Baseline_pr, Queue, Q
+ueue_pr...
Baseline: 1.62739 wallclock secs ( 1.62 usr + 0.01 sys = 1.63 CPU)
+ @ 61349.69/s (n=100000)
Baseline_pr: 2.13879 wallclock secs ( 2.14 usr + 0.00 sys = 2.14 CPU
+) @ 46728.97/s (n=100000)
Queue: 3.61375 wallclock secs ( 3.61 usr + 0.00 sys = 3.61 CPU)
+ @ 27700.83/s (n=100000)
Queue_pr: 3.70646 wallclock secs ( 3.71 usr + 0.00 sys = 3.71 CPU)
+ @ 26954.18/s (n=100000)
Rate Queue_pr Queue Baseline_pr Baseline
Queue_pr 26954/s  3% 42% 56%
Queue 27701/s 3%  41% 55%
Baseline_pr 46729/s 73% 69%  24%
Baseline 61350/s 128% 121% 31% 
Benchmark: timing 1000 iterations of Baseline, Baseline_pr, Queue, Que
+ue_pr...
Baseline: 5.63275 wallclock secs ( 5.60 usr + 0.03 sys = 5.63 CPU)
+ @ 177.62/s (n=1000)
Baseline_pr: 8.3455 wallclock secs ( 8.32 usr + 0.02 sys = 8.34 CPU)
+ @ 119.90/s (n=1000)
Queue: 25.3093 wallclock secs (25.29 usr + 0.01 sys = 25.30 CPU)
+ @ 39.53/s (n=1000)
Queue_pr: 25.1902 wallclock secs (25.17 usr + 0.02 sys = 25.19 CPU)
+ @ 39.70/s (n=1000)
Rate Queue Queue_pr Baseline_pr Baseline
Queue 39.5/s  0% 67% 78%
Queue_pr 39.7/s 0%  67% 78%
Baseline_pr 120/s 203% 202%  32%
Baseline 178/s 349% 347% 48% 
 [reply] [d/l] [select] 


Re: Challenge: Sorting Sums Of Sorted Series
by JavaFan (Canon) on Feb 03, 2010 at 23:57 UTC

The challenge then is that memory consumption needs to be less than N + M + N*M where N and M represent the number of elements in the respective lists.
That's N + M + N * M of what? Since you aren't writing it as a bigO notation (in which case, it's fairly trivial, as any naive implementation doesn't need more memory than O(N*M), and cannot use less as that's the size of the output list), you should mention the units you mean. Considering that N + M is the size of the input, and N * M the size of the output, it means that if the unit is "values", there isn't room for a single iterator.
 [reply] [d/l] 
Re: Challenge: Sorting Sums Of Sorted Series
by kennethk (Abbot) on Feb 04, 2010 at 16:30 UTC

Given the mess that is Re: Challenge: Sorting Sums Of Sorted Series, I'll post a pair of working, fast solutions here. The solutions that follow use O(N+M) memory and are O(N^2*M) in time. The basic algorithm implemented is:
Populate a queue with the sum of all elements of the short list with the first element of the other list. Keep track of the indices of each value. This queue is by definition sorted. Then loop until the queue is empty, doing the following:
 Shift and print the first element of the queue;
 Increment the second list index for the shifted element. Cycle loop if this index is outside the second list.
 Use a sorted insertion to add the sum at the new coordinates to the queue.
I wrote a version with an array for each tracked element (solution_2) and one where all three are lumped into a single entry.
sub solution_2 {
# adaptive motion queue solution
my ($list_ref1, $list_ref2) = @_;
my @list1;
my @list2;
if (@$list_ref1 <= @$list_ref2) {
@list1 = @$list_ref1;
@list2 = @$list_ref2;
} else {
@list1 = @$list_ref2;
@list2 = @$list_ref1;
}
my @queue = map $_+$list2[0], @list1;
my @xs = (0 .. $#list1);
my @ys = (0) x @list1;
while (@queue) {
print OUT shift(@queue), "\n";
my $x = shift @xs;
my $y = shift @ys;
next if ++$y >= @list2;
my $sum = $list1[$x] + $list2[$y];
my $count = 0;
$count++ until $count == @queue or $sum <= $queue[$count];
splice @queue, $count, 0, $sum;
splice @xs, $count, 0, $x;
splice @ys, $count, 0, $y;
}
}
sub solution_3 {
# adaptive motion queue solution, one array
my ($list_ref1, $list_ref2) = @_;
my @list1;
my @list2;
if (@$list_ref1 <= @$list_ref2) {
@list1 = @$list_ref1;
@list2 = @$list_ref2;
} else {
@list1 = @$list_ref2;
@list2 = @$list_ref1;
}
my @queue = ();
for (0 .. $#list1) {
push @queue, join("x", $list1[$_]+$list2[0], "$_", "0");
}
while (@queue) {
my $entry = shift(@queue);
my ($old, $x, $y) = split /x/, $entry;
print OUT "$old\n";
next if ++$y >= @list2;
my $sum = $list1[$x] + $list2[$y];
my $count = 0;
{
no warnings 'numeric';
$count++ until $count == @queue or $sum <= $queue[$count];
}
splice @queue, $count, 0, "${sum}x${x}x${y}";
}
}
And benchmarks on 4 element and 100 element equally distributed arrays:
Benchmark: timing 100 iterations of 1_array, 3_array, Baseline, LR_1..
+.
1_array: 10.651 wallclock secs (10.65 usr + 0.00 sys = 10.65 CPU)
+@ 9.39/s (n=100)
3_array: 9.28456 wallclock secs ( 9.28 usr + 0.00 sys = 9.28 CPU)
+ @ 10.78/s (n=100)
Baseline: 0.844141 wallclock secs ( 0.84 usr + 0.00 sys = 0.84 CPU
+) @ 119.05/s (n=100)
(warning: too few iterations for a reliable count)
LR_1: 20.4537 wallclock secs (20.45 usr + 0.00 sys = 20.45 CPU)
+ @ 4.89/s (n=100)
Rate LR_1 1_array 3_array Baseline
LR_1 4.89/s  48% 55% 96%
1_array 9.39/s 92%  13% 92%
3_array 10.8/s 120% 15%  91%
Baseline 119/s 2335% 1168% 1005% 
Benchmark: timing 100000 iterations of 1_array, 3_array, Baseline, LR_
+1...
1_array: 7.06968 wallclock secs ( 7.07 usr + 0.00 sys = 7.07 CPU)
+ @ 14144.27/s (n=100000)
3_array: 5.09259 wallclock secs ( 5.09 usr + 0.00 sys = 5.09 CPU)
+ @ 19646.37/s (n=100000)
Baseline: 2.11864 wallclock secs ( 2.12 usr + 0.00 sys = 2.12 CPU)
+ @ 47169.81/s (n=100000)
LR_1: 7.01637 wallclock secs ( 7.02 usr + 0.00 sys = 7.02 CPU)
+ @ 14245.01/s (n=100000)
Rate 1_array LR_1 3_array Baseline
1_array 14144/s  1% 28% 70%
LR_1 14245/s 1%  27% 70%
3_array 19646/s 39% 38%  58%
Baseline 47170/s 233% 231% 140% 
 [reply] [d/l] [select] 

