Just another Perl shrine PerlMonks

### Challenge: Sorting Sums Of Sorted Series

by Limbic~Region (Chancellor)
 on Feb 02, 2010 at 17:05 UTC Need Help??
Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

All,
Yesterday, I grumbled in the CB that there had been a lack of challenging problems that were interesting lately. ikegami was kind enough to suggest sorting the sums of two sorted lists. For instance:
```1  2
3  4
5  6
7  8

3, 5, 5, 7, 7, 7, 9, 9, 9, 9, 11, 11, 11, 13, 13, 15

Now obviously, there is nothing challenging about this. Perform a nested loop to collect all sums and then sort the remaining list. The challenge then is that memory consumption needs to be less than N + M + N*M where N and M represent the number of elements in the respective lists. Of course you could cheat by putting things out on disk or what not, but that really isn't in the spirit of the challenge. So - what is the least amount of memory you can use above N + M? Take it as a given that you can store both lists in memory and that you will always know which of the two lists has fewer elements if not the same.

I have 3 solutions where I have whittled the memory consumption down. The best only requires N memory above N + M where N represents the smaller of the two lists. I haven't Benchmarked it against the others but there are virtual bonus points for being both memory efficient and fast.

Cheers - L~R

Replies are listed 'Best First'.
Re: Challenge: Sorting Sums Of Sorted Series
by BrowserUk (Pope) on Feb 02, 2010 at 17:43 UTC

The fact that you have 16 results but only require 12 units of memory means that you only have to output the values ordered, not store the ordered results set?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
BrowserUk,
Yes. I am always amazed at how opaque my attempts at being clear can be. Produce the ordered output without storing it all in memory first.

I realize that there is little practical application to the problem abstracted as far as it is. ikegami's real use case was very specific but didn't make for a good challenge - besides, I didn't have all the details.

Cheers - L~R

Re: Challenge: Sorting Sums Of Sorted Series
by blokhead (Monsignor) on Feb 02, 2010 at 22:46 UTC
You can always save space by never storing anything -- just recompute a sum every time you need it.. Here's a solution that uses constant space:
```sub print_sums {
my (\$listA, \$listB) = @_;

my \$min = \$listA->[0] + \$listB->[0] - 1;

while (\$min < \$listA->[-1] + \$listB->[-1]) {
my \$nextmin = undef;
my \$multiplicity = 0;

for my \$i (0 .. \$#\$listA) {
for my \$j (0 .. \$#\$listB) {
my \$sum = \$listA->[\$i] + \$listB->[\$j];
if (\$sum > \$min and (\$sum < \$nextmin or not defined \$nextm
+in)) {
(\$nextmin, \$multiplicity) = (\$sum, 1);
} elsif (\$sum == \$nextmin) {
\$multiplicity++;
}
}
}

print( (\$nextmin) x \$multiplicity);
\$min = \$nextmin;
}
}
At each iteration, it traverses both lists to find the next smallest possible sum, and how many times it occurs. It only needs to keep track of \$min, \$nextmin, \$multiplicity, \$i, \$j.

Of course the tradeoff is the running time, which is O((NM)2).

Perhaps the metric should be to minimize the product of time space complexities? For comparison, naively computing all sums and sorting uses NM space and NM log(NM) time, so it's slightly worse than mine under the combined metric.

Update: extra parens around print statement..

Your code needs some TLC and your for (0 .. \$#list) should probably be for (;;) loops. I had this idea too but felt it wasn't in the spirit of what ikegami intended (though it meets my criteria just fine).
1. Find the lowest sum paying attention to how many times it appears, then print
2. Find the lowest sum that is greater than the previous pass paying attention to how many times it appears, then print
3. Repeat until there are no new values found

The best solution I have (memory/time) is 4N + M but the most memory efficient while still having a reasonable run time is 2N + M.

Cheers - L~R

Your code needs some TLC and your for (0 .. \$#list) should probably be for (;;) loops.

Why? Do you believe the list gets flattened? No, for (EXPR..EXPR) is implemented as a counting loop.

Re: Challenge: Sorting Sums Of Sorted Series
by Krambambuli (Curate) on Feb 03, 2010 at 08:12 UTC
Can't help, have to ask this question:

how would an algorithm that would deal with this problem for infinite (huge...) lists look like...?

Update: And, given a finite part of each serie, a1, ..., aN, b1, ..., bM, how many - which - terms of the result could be trustfully computed (knowing that there are other terms to follow) ?

And how far could we advance with the 'arrival' of aN+1 and/or bM+1...?

hmmmm... Is someone aware of some good readings in order to get the answers ?

Thanks,

Krambambuli
---
Krambambuli,
Before answering your question, keep in mind I already allowed for both lists to fit in memory as a given. In reality, if you had a really large list you probably wouldn't be going through these contortions - you would probably sort on disk.

I guess your question boils down to: How can you do this with even less than 2N + M? blokhead has shown a solution that uses no memory but takes (N*M)^2 to run. TO give you a specific answer, one would need a specific problem. As I demonstrated in How many words does it take?, the trick to solving really hard problems in the general case (NP complete) is to exploit the details of the specific case (effectively turned into O(1)).

Cheers - L~R

For representing infinite lists I recommend you to read Mark Jason Dominus' excellent book named Higher Order Perl, especially the chapter on infinite lists (where he is using closures for iterators and promise-forcing streams). It is available online in pdf format: http://hop.perl.plover.com.

And for your updated question: given the finite parts (a1, ..., aN), (b1, ..., bM), if you take min(a1+bM, b1+aN), you are safe to print any sum less than or equal to this threshold as a beginning sequence of the final list. Anything above this threshold cannot be considered final.

update: First I've written this: min(a1, b1) + min(aN, bM), but this is wrong, corrected above. And minor grammatical edits, arrgh...
Re: Challenge: Sorting Sums Of Sorted Series
by Limbic~Region (Chancellor) on Feb 03, 2010 at 13:26 UTC
All,
My 3 solutions are below. The first two were not created with the memory challenge in mind. In the first, I was trying to show ikegami that if you turned the problem on its side, a merge algorithm would work.

This uses the general purpose merge algorithm iterator I created at Merge Algorithm (Combining Sorted Lists). I discovered several bugs 3 years later but they are fixed in the version below.

In this version, I realized that the iterators were not needed. They really didn't reduce complexity and there is obviously the performance penalty of invoking subs repeatedly.

```#!/usr/bin/perl
use constant VAL => 0;
use constant IDX => 1;
use constant SUM => 2;
use strict;
use warnings;

my @list1 = (1 .. 5);
my @list2 = (1 .. 5);

my @merge_list = map {[\$_, 0, \$_ + \$list2[0]]} @list1; # Map over smal
+ler list if known

while (@merge_list) {
my (\$min, \$index) = (\$merge_list[0][SUM], 0);
for (1 .. \$#merge_list) {
(\$min, \$index) = (\$merge_list[\$_][SUM], \$_) if \$merge_list[\$_]
+[SUM] < \$min;
}
print "\$min\n";
\$merge_list[\$index][IDX]++;
if (\$merge_list[\$index][IDX] > \$#list2) {
splice(@merge_list, \$index, 1);
}
else {
\$merge_list[\$index][SUM] = \$merge_list[\$index][VAL] + \$list2[\$
+merge_list[\$index][IDX]];
}
}

This is where I realized it could be turned into a challenge since only 1 of the 3 tracking values was necessary. Caching the sum improves performance but what are a few cycles amongst friends. I also realized that keeping track of the value in @list1 was also just a caching technique. With it gone, I was left with 2N + M along with about 4 constant scalars.

```#!/usr/bin/perl
use strict;
use warnings;

my @list1 = 1 .. 5;
my @list2 = 1 .. 5;

my @merge_list = ((0) x scalar @list1); # use smaller of 2 lists

while (1) {
my (\$min, \$idx);
for (my \$i = 0; \$i < @merge_list; ++\$i) {
next if \$merge_list[\$i] > \$#list2;
my \$sum = \$list1[\$i] + \$list2[\$merge_list[\$i]];
(\$min, \$idx) = (\$sum, \$i) if ! defined \$min || \$sum < \$min;
}
last if ! defined \$min;
print "\$min\n";
\$merge_list[\$idx]++;
}

Cheers - L~R

Re: Challenge: Sorting Sums Of Sorted Series
by Ratazong (Monsignor) on Feb 03, 2010 at 13:45 UTC

Hi Monks

The following algorithm seems to need 4+N+M memory ... and therefore fulfills the challenge :-)

1. val = N1 + M1
2. loop
1. for all i, j: if (Ni+Mj == val) print (Ni+Mj)
2. nextval = val
3. for all i, j: if ((Ni+Mj)>val) && ((Ni+Mj) < nextval) => nextval = (Ni + Mj)
4. if (val == nextval) => exit # we are finished
5. val = nextval
3. goto 2.

So in additon to N and M, you need the constant space for 4 variables (i, j, val, nextval) and for the temporary sums. :-)
However the runtime would be O(N^2*M^2) (*) - so I wouldn't consider it efficient ;-)

Rata

(*) 2 * N * M for the inner loop - and a maximum N * M for the outer one

note: the restriction that N and M are sorted can easily be dropped with the algorithm above (just replace line 1 with lines 2.2.-2.5)

update: Limbic~Region just told me that blokhead posted nearly the same solution before (he needs one variable more, which doesn't affect the complexity O((NM)^2). ... I didn't want to waste bandwidth, but I fear duplicated solutions are the risk if one is not cheating ;-)

Re: Challenge: Sorting Sums Of Sorted Series
by kennethk (Abbot) on Feb 03, 2010 at 16:23 UTC
Have I got a solution for you, and some benchmarking to back it up. Since we know that any given Ai,j (where Ai,j = Ni + Mj) will be larger than Ai-1,j or Ai,j-1, we can establish a queue of values to be output that is the length of the shorter of the two lists and perform a sorted insertion on that list for each new value. This solution meets the 2N+M criterion and, unless I'm missing something, should run in N^2*M (worst case), where an initial test guarantees N is the smaller list. What follows is a benchmarking script, including both of Limbic~Region's and blokhead's posted solutions, a naive baseline sort with N + M + N*M memory, and a terrible solution that works only on integers I cobbled together yesterday, similar in many ways to blockhead's but with terrible performance when max(output) - min(output) is large.

For 100-element lists, I got:

```Benchmark: timing 100 iterations of Baseline, Integers, LR_1, LR_2, Qu
Baseline: 0.554613 wallclock secs ( 0.54 usr +  0.01 sys =  0.55 CPU
+) @ 181.82/s (n=100)
(warning: too few iterations for a reliable count)
Integers: 32.3786 wallclock secs (32.37 usr +  0.00 sys = 32.37 CPU)
+ @  3.09/s (n=100)
LR_1: 18.7755 wallclock secs (18.77 usr +  0.00 sys = 18.77 CPU)
+ @  5.33/s (n=100)
LR_2: 69.7311 wallclock secs (69.73 usr +  0.00 sys = 69.73 CPU)
+ @  1.43/s (n=100)
Queue: 2.55908 wallclock secs ( 2.55 usr +  0.00 sys =  2.55 CPU)
+ @ 39.22/s (n=100)
blokhead: 95.6313 wallclock secs (95.62 usr +  0.01 sys = 95.63 CPU)
+ @  1.05/s (n=100)
Rate blokhead     LR_2 Integers     LR_1    Queue Baseline
blokhead 1.05/s       --     -27%     -66%     -80%     -97%     -99%
LR_2     1.43/s      37%       --     -54%     -73%     -96%     -99%
Integers 3.09/s     195%     115%       --     -42%     -92%     -98%
LR_1     5.33/s     409%     271%      72%       --     -86%     -97%
Queue    39.2/s    3650%    2635%    1169%     636%       --     -78%
Baseline  182/s   17287%   12578%    5785%    3313%     364%       --

And for the 4 element lists in the original post:

```Benchmark: timing 100000 iterations of Baseline, Integers, LR_1, LR_2,
Baseline: 1.62766 wallclock secs ( 1.63 usr +  0.00 sys =  1.63 CPU)
+ @ 61349.69/s (n=100000)
Integers: 7.29155 wallclock secs ( 7.28 usr +  0.01 sys =  7.29 CPU)
+ @ 13717.42/s (n=100000)
LR_1: 7.017 wallclock secs ( 7.02 usr +  0.00 sys =  7.02 CPU) @
+ 14245.01/s (n=100000)
LR_2: 7.79621 wallclock secs ( 7.80 usr +  0.00 sys =  7.80 CPU)
+ @ 12820.51/s (n=100000)
Queue: 3.60201 wallclock secs ( 3.60 usr +  0.00 sys =  3.60 CPU)
+ @ 27777.78/s (n=100000)
blokhead: 9.88667 wallclock secs ( 9.89 usr +  0.00 sys =  9.89 CPU)
+ @ 10111.22/s (n=100000)
Rate blokhead     LR_2 Integers     LR_1    Queue Baseline
blokhead 10111/s       --     -21%     -26%     -29%     -64%     -84%
LR_2     12821/s      27%       --      -7%     -10%     -54%     -79%
Integers 13717/s      36%       7%       --      -4%     -51%     -78%
LR_1     14245/s      41%      11%       4%       --     -49%     -77%
Queue    27778/s     175%     117%     103%      95%       --     -55%
Baseline 61350/s     507%     379%     347%     331%     121%       --

Update: Found and fixed bug with queue method - I swear I tested it. Following that, I updated the benchmarking results with the correct values - queue down a little, but it still well better than other options.

Update: Gah, found bug in algorithm - essentially as the array of results grows large, order was upset because @queue was not large enough. After a little effort, I determined the minimum safe value for @queue, assuming the same transit order as I have been using, is 1/2*N*M-N - for the pathological case, you must store almost the entire lower triangle. While this does meet the specified criteria in a literal sense, I assume the intent was to avoid O(N*M) scaling and thus this fails. I tried a couple of different traversal directions, but came up with zilch. Maybe something will come to me in my dreams...

kennethk,
I haven't digested your code but at a glance, your Benchmark is flawed for a couple of reasons.
• They don't do the same amount of IO (solution_1 and baseline do fewer prints)
• solution_1 doesn't produce the correct output
Here is the output of your code with the following lists

Cheers - L~R

As per update, fixed the bug in solution_1, with little modification to performance improvement. Regarding the contention that baseline and solution_1 are doing fewer prints, of course they are. If IO is a bottle neck, then a method that can reduce calls to IO is a better method. But, if you want to eliminate those benefits of the algorithms, benchmarking with subroutines including lines like

print OUT "\$_\n" foreach @list;

result in the following comparisons, where _pr specifies a version with more print statements:

```Benchmark: timing 100000 iterations of Baseline, Baseline_pr, Queue, Q
+ueue_pr...
Baseline: 1.62739 wallclock secs ( 1.62 usr +  0.01 sys =  1.63 CPU)
+ @ 61349.69/s (n=100000)
Baseline_pr: 2.13879 wallclock secs ( 2.14 usr +  0.00 sys =  2.14 CPU
+) @ 46728.97/s (n=100000)
Queue: 3.61375 wallclock secs ( 3.61 usr +  0.00 sys =  3.61 CPU)
+ @ 27700.83/s (n=100000)
Queue_pr: 3.70646 wallclock secs ( 3.71 usr +  0.00 sys =  3.71 CPU)
+ @ 26954.18/s (n=100000)
Rate    Queue_pr       Queue Baseline_pr    Baseline
Queue_pr    26954/s          --         -3%        -42%        -56%
Queue       27701/s          3%          --        -41%        -55%
Baseline_pr 46729/s         73%         69%          --        -24%
Baseline    61350/s        128%        121%         31%          --
Benchmark: timing 1000 iterations of Baseline, Baseline_pr, Queue, Que
+ue_pr...
Baseline: 5.63275 wallclock secs ( 5.60 usr +  0.03 sys =  5.63 CPU)
+ @ 177.62/s (n=1000)
Baseline_pr: 8.3455 wallclock secs ( 8.32 usr +  0.02 sys =  8.34 CPU)
+ @ 119.90/s (n=1000)
Queue: 25.3093 wallclock secs (25.29 usr +  0.01 sys = 25.30 CPU)
+ @ 39.53/s (n=1000)
Queue_pr: 25.1902 wallclock secs (25.17 usr +  0.02 sys = 25.19 CPU)
+ @ 39.70/s (n=1000)
Rate       Queue    Queue_pr Baseline_pr    Baseline
Queue       39.5/s          --         -0%        -67%        -78%
Queue_pr    39.7/s          0%          --        -67%        -78%
Baseline_pr  120/s        203%        202%          --        -32%
Baseline     178/s        349%        347%         48%          --
Re: Challenge: Sorting Sums Of Sorted Series
by JavaFan (Canon) on Feb 03, 2010 at 23:57 UTC
The challenge then is that memory consumption needs to be less than N + M + N*M where N and M represent the number of elements in the respective lists.
That's N + M + N * M of what? Since you aren't writing it as a big-O notation (in which case, it's fairly trivial, as any naive implementation doesn't need more memory than O(N*M), and cannot use less as that's the size of the output list), you should mention the units you mean. Considering that N + M is the size of the input, and N * M the size of the output, it means that if the unit is "values", there isn't room for a single iterator.
Re: Challenge: Sorting Sums Of Sorted Series
by kennethk (Abbot) on Feb 04, 2010 at 16:30 UTC
Given the mess that is Re: Challenge: Sorting Sums Of Sorted Series, I'll post a pair of working, fast solutions here. The solutions that follow use O(N+M) memory and are O(N^2*M) in time. The basic algorithm implemented is:

Populate a queue with the sum of all elements of the short list with the first element of the other list. Keep track of the indices of each value. This queue is by definition sorted. Then loop until the queue is empty, doing the following:

1. Shift and print the first element of the queue;
2. Increment the second list index for the shifted element. Cycle loop if this index is outside the second list.
3. Use a sorted insertion to add the sum at the new coordinates to the queue.

I wrote a version with an array for each tracked element (solution_2) and one where all three are lumped into a single entry.

```sub solution_2 {
my (\$list_ref1, \$list_ref2) = @_;
my @list1;
my @list2;
if (@\$list_ref1 <= @\$list_ref2) {
@list1 = @\$list_ref1;
@list2 = @\$list_ref2;
} else {
@list1 = @\$list_ref2;
@list2 = @\$list_ref1;
}

my @queue = map \$_+\$list2[0], @list1;
my @xs = (0 .. \$#list1);
my @ys = (0) x @list1;

while (@queue) {
print OUT shift(@queue), "\n";
my \$x = shift @xs;
my \$y = shift @ys;
next if ++\$y >= @list2;

my \$sum = \$list1[\$x] + \$list2[\$y];

my \$count = 0;
\$count++ until \$count == @queue or \$sum <= \$queue[\$count];
splice @queue, \$count, 0, \$sum;
splice @xs, \$count, 0, \$x;
splice @ys, \$count, 0, \$y;
}
}

sub solution_3 {
# adaptive motion queue solution, one array
my (\$list_ref1, \$list_ref2) = @_;
my @list1;
my @list2;
if (@\$list_ref1 <= @\$list_ref2) {
@list1 = @\$list_ref1;
@list2 = @\$list_ref2;
} else {
@list1 = @\$list_ref2;
@list2 = @\$list_ref1;
}

my @queue = ();
for (0 .. \$#list1) {
push @queue, join("x", \$list1[\$_]+\$list2[0], "\$_", "0");
}

while (@queue) {
my \$entry = shift(@queue);
my (\$old, \$x, \$y) = split /x/, \$entry;
print OUT "\$old\n";

next if ++\$y >= @list2;

my \$sum = \$list1[\$x] + \$list2[\$y];

my \$count = 0;
{
no warnings 'numeric';
\$count++ until \$count == @queue or \$sum <= \$queue[\$count];
}
splice @queue, \$count, 0, "\${sum}x\${x}x\${y}";
}
}

And benchmarks on 4 element and 100 element equally distributed arrays:

```Benchmark: timing 100 iterations of 1_array, 3_array, Baseline, LR_1..
+.
1_array: 10.651 wallclock secs (10.65 usr +  0.00 sys = 10.65 CPU)
+@  9.39/s (n=100)
3_array: 9.28456 wallclock secs ( 9.28 usr +  0.00 sys =  9.28 CPU)
+ @ 10.78/s (n=100)
Baseline: 0.844141 wallclock secs ( 0.84 usr +  0.00 sys =  0.84 CPU
+) @ 119.05/s (n=100)
(warning: too few iterations for a reliable count)
LR_1: 20.4537 wallclock secs (20.45 usr +  0.00 sys = 20.45 CPU)
+ @  4.89/s (n=100)
Rate     LR_1  1_array  3_array Baseline
LR_1     4.89/s       --     -48%     -55%     -96%
1_array  9.39/s      92%       --     -13%     -92%
3_array  10.8/s     120%      15%       --     -91%
Baseline  119/s    2335%    1168%    1005%       --

Benchmark: timing 100000 iterations of 1_array, 3_array, Baseline, LR_
+1...
1_array: 7.06968 wallclock secs ( 7.07 usr +  0.00 sys =  7.07 CPU)
+ @ 14144.27/s (n=100000)
3_array: 5.09259 wallclock secs ( 5.09 usr +  0.00 sys =  5.09 CPU)
+ @ 19646.37/s (n=100000)
Baseline: 2.11864 wallclock secs ( 2.12 usr +  0.00 sys =  2.12 CPU)
+ @ 47169.81/s (n=100000)
LR_1: 7.01637 wallclock secs ( 7.02 usr +  0.00 sys =  7.02 CPU)
+ @ 14245.01/s (n=100000)
Rate  1_array     LR_1  3_array Baseline
1_array  14144/s       --      -1%     -28%     -70%
LR_1     14245/s       1%       --     -27%     -70%
3_array  19646/s      39%      38%       --     -58%
Baseline 47170/s     233%     231%     140%       --

Create A New User
Node Status?
node history
Node Type: perlquestion [id://820975]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
 LanX prefers blood moons

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2017-08-21 18:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (324 votes). Check out past polls.

Notices?