BrowserUk has asked for the
wisdom of the Perl Monks concerning the following question:
This is not directly Perl related, but might become so.
I'm looking for examples and/or discussionpractical or theoreticalof the problems, and hopefully solutions, involved in threading inherently recursive algorithms.
Any and all leads will be gratefully received and vigorously pursued.
Thanks.
Re: [OT]: threading recursive subroutines. by dHarry (Monsignor) on Feb 02, 2011 at 11:22 UTC 
I don't do much with threads, what I use is standard stuff in the JEE environment. Java handles things a bit different I think (making some sacrifices for the sake of simplicity). I recently stumbled upon Speculative parallelisation. Maybe it is of use to you.
Do you have a specific type of problem in mind? I myself am thinking about reducing a complex scheduling problem by spawning threads to optimize things locally and later combine the results into a global solution. I have indications that I might get away with such an approach. I haven't make much progress though.
Cheers,
Harry
 [reply] 

Thank you for the Speculative parallelisation link, It is exactly the type of response I was looking for.
As the authors note in their for abstract:
Prior research has focused mainly on parallelising loops. Recursive procedures, which are also frequently used in realworld applications, have attracted much less attention
Their list of references may also lead to some further interesting stuff, though all to often I hit the road block of no longer having access to the AoCM and IEEE repositories :( But just often enough the authors of such papers make them available on their own websites for free, and these are much easier to find once you have titles and keywords to look for.
Do you have a specific type of problem in mind?
My sample case is Ackermann–Péter variation of the Ackermann function, chosen specifically because it doesn't lend itself to trivial recursion elimination, but can be relatively easily parallelised manually.
Other examples are any number of inherently recursive algorithms like binary search, tree manipulations, sorts et al.
I'm not really sure what my goal is. Having again just had inordinate difficulty in trying to parallelise a recursive algorithm, and recognised that I went through the same dead ends and misdirections that I went through on previous occasions when trying to do the same things for different algorithms, I got to wondering if anyone had done any research into the subject, and my attempted searches turned up a whole lot of nothing.
It feels like there should be a set of steps and constraints one could follow to approach the problem, but from my research these do not yet exist. Which of course could mean that they could not exist. But with the lack of research on the subject, it could also mean that they just haven't been looked for; or that they are not easy to find. So, at this point, I'm just looking thank you :)
For example. It seems to me that for a recursive algorithmthat is not tail recursiveto be a suitable candidate for parallelisation, it requires that it has multipletwo or morerecursive calls at each level. Many do. This is an essential signature of a whole group of divideandconquer algorithms.
So, I'm not sure where I am going with this, but you have my profound thanks for helping me on my way :)
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] 
Re: [OT]: threading recursive subroutines. by chrestomanci (Priest) on Feb 02, 2011 at 13:22 UTC 
I know almost nothing about functonal programing, but I think that you would gain some insight from looking at how functional programing languages handle threads and recurson.
Consider the cannonical example of qsort in Haskell:
quicksort :: (Ord a) => [a] > [a]
quicksort [] = []
quicksort (x:xs) =
let smallerSorted = quicksort [a  a < xs, a <= x]
biggerSorted = quicksort [a  a < xs, a > x]
in smallerSorted ++ [x] ++ biggerSorted
The way I read that code, the list get split into those elements larger than the pivot and those smaller than the pivot. Each list is recursively sorted, and the Haskell runtime is free to do those two sorts in parallel.
 [reply] [d/l] 

 [reply] 

First. Thanks for your considered response.
I think that this much quoted example of Haskell is actually something of a trompe d'oil. I'll try to explain that.
The reason for parallelising this would be to improve performance. But, the reality is that the first and best way of improving sort performance in Haskell, it to not use this cute and (still) hugely, visually impressive implementation.
The problem is, that whilst the concise clarity, and relatively easy to understand semantics of this list comprehension solution are an obvious advert for Haskell and a good teaching aid, it tricks the eye into believing that it is an efficient implementation. This is far from the case.
The problem is that given the huge disparity of speed between on chip and offchip memory access, memory allocation and management costs are a significant factor in any cpuintensive algorithm. And this implementation exacerbates those costs by effectively having to not just duplicate the memory allocations at every levelby creating two new list from each previous list. But it also has to concatenate these sublists back into (new) composite lists.
See the Wikipedia discussion for more detail. If you read to the end, you'll see that Haskell has now moved to using Merge sort. In part, in order to counter this problem.
the Haskell runtime is free to do those two sorts in parallel.
This is the crux of the threading recursion problem.
If the runtime decided to always perform the two sorts in parallel, then it would end up spawning one thread for every item in the list to be sorted. This is obviously not a practical solution.
That means that to (automatically) parallelise the algorithm, the runtime would need to decide how many levels deep into the recursion it should allow the parallelisation to go before resorting to straight forward recursion in order to bound that parallelisation.
Let's say that the runtime discovered how may threads of execution are available and can rewrite the algorithm such that once that limit has been reached, it resorts to simple recursion. The problem still remains that the usedonlyonce nature of Haskell variables is going to impose significant memory management costs upon the algorithm. In all probability, sufficient costs to negate most of the benefits of the parallelisation.
Given that a quick sort can be implemented to run inplacealbeit that Haskell requires the breaking of its own rules to sothe benefits of doing so and thus avoiding the memory management overheads, far outweigh the benefits of parallelising the list comprehension algorithm.
And once the quick sort is performed inplace, parallelising that algorithm is both easier, and produces significantly better performance improvement multiples.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] 
Re: [OT]: threading recursive subroutines. by sundialsvc4 (Monsignor) on Feb 02, 2011 at 15:20 UTC 
Threading buys you only two things:

It allows you to overlap computation with I/O (but it does not alter the fact that I/O is usually the physically limiting determinant of throughput).

It allows you the potential opportunity to utilize multiple CPUs and/or cores, if they exist.
Recursion, on the other hand, is a property of an algorithm. Furthermore, it is significant to observe that recursion is not parallel. The recursive iteration must complete, and deliver its result, before the outer iteration(s) may proceed.
Many algorithms that can be expressed in a recursive fashion, also can be expressed in a massively parallel fashion ... but traditional threading models are not “massive parallelism,” a term that is usually reserved for talking about things like array (co)processors.
 [reply] 

Once again, you offer a few authoritative sounding 'wisdoms' in place of 'an answer'.
And as usual, on close inspection, they not only are of no value in terms of answering the question asked; they are in large part fundamentally incorrect.
 Threading buys you only two things:
This belies even the crudest understanding of threads. For example, you completely omit the following "buys" of threading:
 substantially cheaper context spawning.
 substantially cheaper context switching.
 the avoidance of kernel mode switching for state sharing.
 others.
 but it does not alter the fact that I/O is usually the physically limiting determinant of throughput
So, in your experience, there are no applications where a short burst of IO results in a large volume of cpu intensive computation?
 It allows you the potential opportunity to utilize multiple CPUs and/or cores, if they exist.
You've failed to notice that?:
 Even the cheapest commodity box you can buy now has multiple cores.
They're already turning up in highend smart phones. Next year they'll be in disposable cells. A year or two beyond that and dualcore arms will be turning up in musical Xmas cards as they are replaced by 4 & 8core processors for phone/tablet/ICE applications.
 That the practical manifestation of Moore's Lawthe doubling of the number of transistors per chipwhich until recently meant either the doubling of clock speeds, or (less frequently) the doubling of the width of the registers, has effectively hit the practical limits?
You haven't read the writing all around you that says that in the future, the maintenance of Moore's Law means that the number of cores will likely double every 2 years whilst clock speeds and register widths stagnate.
 Recursion, on the other hand, is a property of an algorithm.
Recursion is a property of implementation.
In many cases, algorithms written recursively in any given highlevel language, are run iteratively by the implementation. This is the result of the wellknown 'tail call elimination'
 Furthermore, it is significant to observe that recursion is not parallel. The recursive iteration must complete, and deliver its result, before the outer iteration(s) may proceed.
This is a(nother) very naive generalisation.
Many, many algorithms that are habitually described recursively, due to the succinctness and clarity of that description are trivially convertible to an iterative implementation. And as soon as you have iteration, the potential for simple, direct parallelisation is obvious.
 Many algorithms that can be expressed in a recursive fashion, also can be expressed in a massively parallel fashion ... but traditional threading models are not “massive parallelism,” a term that is usually reserved for talking about things like array (co)processors.
Is there such a thing as a "tradition threading model"?
If it did exist, the lines of where and when it existed are long since obscured.
Witness that for a (low) few thousands, you can have 16cores on your desktop today. And if you have the need and half a million, an effective 1024 cores is yours for the asking.
Note also that if massively parallel array processing meets your requirements, then a few hundred quid will put the effective equivalent of a Cray 1 in your desktop in the form of a NVIDIA Tegra 2.
Times they are changin' have changed. Change with them, or die.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] 

 [reply] 


 Re: [OT]: threading recursive subroutines. by ikegami (Pope) on Feb 02, 2011 at 19:58 UTC 
Good question! I've been working on it to see what the real issue is. I don't have an answer for you; I'm just documenting what I've found for everyone's benefit.
First, I presume that "inherently recursive" means "cannot be rewritten to be tailrecursive". Fibonacci is an example of an inherently recursive algorithm.
sub fibonacci {
my ($n) = @_;
return 0 if $n == 0;
return 1 if $n == 1;
return sum fibonacci($n2), fibonacci($n1);
}
At face value, it's quite easy to make it threaded:
sub fibonacci {
my ($n) = @_;
return 0 if $n == 0;
return 1 if $n == 1;
return sum map $_>join(),
async { fibonacci($n2) },
async { fibonacci($n1) };
}
But what if you wanted to limit the number of threads (perhaps to limit overhead)? One solution would be to have a pool of threads and to use them if available.
sub fibonacci {
my ($n) = @_;
return 0 if $n == 0;
return 1 if $n == 1;
return sum map $_>get(),
async_maybe { fibonacci($n2) },
async_maybe { fibonacci($n1) };
}
(The above assumes closures can be shared, but it can be rewritten to not use closures.)
When implementing async_maybe (and the get of the object it returns), one must be extra careful to avoid the situation where a thread is waiting to have it's result collected.
But what if you want a worker model (perhaps to distribute the work to other machines)? Now, that's hard. One would need some kind of a callback system.
sub fibonacci {
my ($n) = @_;
my $result;
process_and_wait(sub {
fibonacci_task(sub { $result = $_[0]; }, $n );
});
return $result;
}
sub fibonacci_task {
my ($on_complete, $n) = @_;
return $on_complete>(0) if $n == 0;
return $on_complete>(1) if $n == 1;
my ($x,$y);
process(sub { fibonacci_task(sub { $x = $_[0] }, $n2) });
process(sub { fibonacci_task(sub { $y = $_[0] }, $n1) });
#TODO: The last of the two tasks to complete
# must call $on_complete>($x+$y).
}
The TODO item is hard to do cleanly. And of course, I'm still using closures even though I think that's not an option for threads threads. Rewriting to avoid closures will likely make the code even longer.
 [reply] [d/l] [select] 

Turned out that using closures, while providing a simple design, actually caused the problem. The design below still needs work because it passes closures around, but it works in concept as shown by the Coro implementation below. (It probably defies the point to use Coro, but it's easier to prototype using it.)
use strict;
use warnings;
use Engine qw( );
use constant NUM_WORKERS => 4;
sub fibonacci_task {
my ($engine, $on_complete, $n) = @_;
return $on_complete>(0) if $n == 0;
return $on_complete>(1) if $n == 1;
my ($x,$y);
$engine>process_group(
sub { $on_complete>($x+$y) },
[ \&fibonacci_task => (sub { $x = $_[0] }, $n2) ],
[ \&fibonacci_task => (sub { $y = $_[0] }, $n1) ],
);
}
sub fibonacci {
my ($engine, $n) = @_;
my $result;
$engine>process_and_wait(
\&fibonacci_task => ( sub { $result = $_[0] }, $n )
);
return $result;
}
{
my $engine = Engine>new(NUM_WORKERS);
printf("%s! = %s\n", $_, fibonacci($engine, $_)) for 1..10;
}
1! = 1
2! = 1
3! = 2
4! = 3
5! = 5
6! = 8
7! = 13
8! = 21
9! = 34
10! = 55
The following belongs at the top of the above script:
 [reply] [d/l] [select] 

Whilst I think that your discussion of the problems is correct, I think your choice of algorithm and your code for demonstrating the problems is problematic. The latter in part because you are using Perl, which I specifically excluded; in part because you then use Coro which IMO clouds the real issues and adds more of its own.
Firstly, Fibonacci is a poor example because it is simply so easy to achieve better performance through mechanisms other than threading. As the only purpose of threading in this context is improving performance, using them for Fibonacci is never going to work effectively.
Whilst the recursive Fibonacci algorithm doesn't lend itself to tail call elimination, it is easily written iteratively for an huge performance gain. Equally, the recursive algorithm can be sped up immensely by simply memoising it.
But more importantly, the iteration/recursion can easily be done away with entirely using Binet's formula:
All of these modifications make it possible to calculate Fibonacci( 1474 ) (the limit of a doubles ability to store the result), more quickly than it is possible to spawn a thread. At least using the current implementation of iThreads. And possibly Coro also.
I realise that you chose it only as a wellknown example on which to base your discussion. But if you are going to explore the threading of recursion in Perl, there are simpler ways than your implementations. This does the job:
#! perl slw
use strict;
use futures;
sub fibonacci {
my $n = shift;
return $n if $n < 2;
my $r1 = futures>new( \&fib_t1, $n 2 );
my $r2 = futures>new( \&fib_t1, $n 1 );
return $r1 + $r2;
}
sub fib_t1 {
my $n = shift;
return $n if $n < 2;
my $r1 = futures>new( \&fib_t2, $n 2 );
my $r2 = futures>new( \&fib_t2, $n 1 );
return $r1 + $r2;
}
sub fib_t2 {
my $n = shift;
return $n if $n < 2;
return fib_t2( $n 2 ) + fib_t2( $n 1 );
}
printf "$_ : %d\n", fibonacci( $_ ) for 1 .. $ARGV[ 0 ];
__END__
C:\test>885839t 20
1 : 1
2 : 1
3 : 2
4 : 3
5 : 5
6 : 8
7 : 13
8 : 21
9 : 34
10 : 55
11 : 89
12 : 144
13 : 233
14 : 377
15 : 610
16 : 987
17 : 1597
18 : 2584
19 : 4181
20 : 6765
Where the futures module is: package futures;
use threads;
sub TIESCALAR {
my( $class, $thread ) = @_;
return bless [ $thread ], $class;
}
sub STORE {
die "Futures cannot be assigned to";
}
sub FETCH {
my $self = shift;
my $r = $self>[0]>join;
return $r;
}
sub new {
my( $class, $code, @args ) = @_;
my $thread = &async( $code, @args );
tie my $future, 'futures', $thread;
return $future;
}
1;
Using three subs serves to highlight the level dependency, but these could be merged into one sub with conditional statements provided there was some way to determine the number of existing threads.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] [select] 

But if you are going to explore the threading of recursion in Perl, there are simpler ways than your implementations. This does the job:
No, that does not limit the number of workers.
Equally, the recursive algorithm can be sped up immensely by simply memoising it.
It doesn't seem to be a technique you can use for distributed work, which is where I thought you were going with this.
All of these modifications make it possible to calculate Fibonacci( 1474 )
The point was to find a technique. Or are you saying the technique will never be as efficient as alternatives?
Anyway, I saw your post about the Ackermann function, and that's a whole different ballgame. I spent a lot of time tinkering with that too after you mentioned it.
As far as I know, one can't make y = f(x); z = f(y); parallel without refactoring by a human, yet the Ackermann function is of that form.
That said, it does have interesting traits that may make parallisation possible (or impossible).
 If you need calculate A(m,n), you know you will also need to calculate A(m,i) for 0<i<n.
 For m>0 and n>0, A(m,n) is a series of n+1 evaluations of A(m1,i) with increasing i.
 ...
But I'm in way over my head.
 [reply] [d/l] 



One idea that I find interesting in practice (but not from an algorithmical point of view) is Grand Central Dispatch, basically the idea of having a worker pool of threads (likely about 2x cores) that get managed by the OS. This approach seems to take the burden of thread administration away from the program and stuffs it into a centralized infrastructure that makes sure that no more than a set limit of threads is running. It seems that Grand Central Dispatch suffers from differentbutstillugly locking problems, but that's expected.
I think that the idea of "futures" has merit  AnyEvent uses something like them in its AnyEvent>condvar mechanism. AnyEvent has at least some way of detecting deadlocks, but as it is basically singlethreaded, detecting deadlocks is easier there I presume. Having transparent "futures" in Perl is feasible through trickery with transparent proxy objects like you present or Win32::OLE and MozRepl::RemoteObject implement in a more general fashion. I think that the performance hit you incur by heavily using tied variables will again negate many of the speed gains you get. I'm also not sure whether having transparent futures will actually help much, because you still need to make really sure that you don't immediately fetch the results after starting a calculation.
If there is a way to detect the dependency chain of futures ahead of time, implementing a thread pool together with the futures could limit the amount of threads spawned with your implementation. But I'm not sure whether that's possible as each future could spawn another future it depends on and thus quickly overrun each fixed limit. But maybe using Coro together with threads could allow switching the context without starting a fresh thread once we run out of CPU threads. But mixing Coro and threads is something that's very unlikely to work...
One of my "easier" thought experiments is the idea of switching Perl to asynchronous IO. I think this is somewhat easier to analyze, as IO iterators are used more often than threads are used, and their use does not require deep understanding and (dead)locking operations are unnecessary. For example, a "lazy file" could work with your/a "future" implementation by pretending to have read a line from a file, until somebody looks at the value:
open_async my $fh, '/etc/passwd'
or die "Couldn't read asynchronously from '/etc/passwd': $!";
while (<$fh>) {
# $_ is a tied variable that will be filled in a background thread
# or through asynchronous IO
next unless /root/;
};
Of course, most (existing) Perl code will simply not benefit from asynchronous IO, as most code will read a line from a file and immediately inspect each value. This will simply negate all benefits we gain from freeing up Perl from waiting for the line to be fetched from the file. Maybe we could gain something by requesting more than the one line (or more than the one buffer) to be read from the file, but that will likely get problematic if we try to use tell and other methods for looking at files.
My conclusion is that implicit and transparent parallelism will likely simply not work because the existing procedural programs will not make use of it. So in any case, specialization from the straightforward approach towards a more parallel approach is necessary (for Perl programs). The likely "easy" solution are callbackoriented or continuationpassing models using closures like AnyEvent or Node.js. There you start the asynchronous call and give it a callback to execute once the operation completes.  [reply] [d/l] [select] 


Re: Ackermann dependency graph by ikegami (Pope) on Feb 03, 2011 at 22:54 UTC 
╔═════╦═════╦═════╦═════╗
║ m= ║ 1 ║ 2 ║ 3 ║
╠═════╬═════╬═════╬═════╣
║ n=0 ║ 2 ║ ║ ║
╚═════╬══v══╣ ║ ║
║ 3 > 3 ║ ║
╠══v══╬══v══╣ ║
║ 4 ║ ║ ║
╠══v══╣ ║ ║
║ 5 > 5 > 5 ║
╠══v══╬══v══╬══v══╣
║ 6 ║ ║ ║
╠══v══╣ ║ ║
║ 7 > 7 ║ ║
╠══v══╬══v══╣ ║
║ 8 ║ ║ ║
╠══v══╣ ║ ║
║ 9 > 9 ║ ║
╠══v══╬══v══╣ ║
║ 10 ║ ║ ║
╠══v══╣ ║ ║
║ 11 > 11 ║ ║
╠══v══╬══v══╣ ║
║ 12 ║ ║ ║
╠══v══╣ ║ ║
║ 13 > 13 > 13 ║
╠══v══╬══v══╬══v══╣
║ 14 ║ ║ ║
╠══v══╣ ║ ║
║ 15 > 15 ║ ║
╠══v══╬══v══╣ ║
║ 16 ║ ║ ║
╠══v══╣ ║ ║
║ 17 > 17 ║ ║
╠══v══╬══v══╣ ║
║ 18 ║ ║ ║
╠══v══╣ ║ ║
║ 19 > 19 ║ ║
╠══v══╬══v══╣ ║
║ 20 ║ ║ ║
╠══v══╣ ║ ║
║ 21 > 21 ║ ║
╠══v══╬══v══╣ ║
║ 22 ║ ║ ║
╠══v══╣ ║ ║
║ 23 > 23 ║ ║
╠══v══╬══v══╣ ║
║ 24 ║ ║ ║
╠══v══╣ ║ ║
║ 25 > 25 ║ ║
╠══v══╬══v══╣ ║
║ 26 ║ ║ ║
╠══v══╣ ║ ║
║ 27 > 27 ║ ║
╠══v══╬══v══╣ ║
║ 28 ║ ║ ║
╠══v══╣ ║ ║
║ 29 > 29 > 29 ║
╚═════╩═════╩═════╝
Each box along the vertical axis is one "n" higher, so A(3,2)=A(2,13)=A(1,27)=29.
 [reply] 
Re: Nonrecursive Ackermann by ikegami (Pope) on Feb 04, 2011 at 00:31 UTC 
Using the dependency graph I posted earlier, I was able to construct a nonrecursive implementation of the Ackermann function. It visits the graph left to right, top to bottom.
use strict;
use warnings;
use feature qw( say );
sub A {
my ($m, $n) = @_;
# $A[$m] holds A($m, $n[$m])
my (@n, @A);
for my $mi (0..$m) {
$n[$mi] = 1;
$A[$mi] = 1;
}
for (;;) {
++$n[0];
$A[0] = $n[0] + 1;
for my $mi (1..$m) {
last if $A[$mi] != $n[$mi1];
$A[$mi] = $A[$mi1];
++$n[$mi];
}
return $A[$m] if $n[$m] == $n;
}
}
say A(@ARGV);
Maybe you'll have better luck with that.
Some properties:
 It only keeps m+1 Ackermann numbers in memory at once.
 It never calculates the same Ackermann number twice.
m<4 can be specialcased. Known formula exist for A(0,n), A(1,n), A(2,n) and A(3,n).
Update: Cleaned up code a bit.
 [reply] [d/l] 

I was able to construct a nonrecursive implementation of the Ackermann function.
Impressive work!
Maybe you'll have better luck with that.
I couldn't see much opportunity for parallisation in your first version.
Update: Cleaned up code a bit.
That's certainly easier to follow. (*)
The 'obvious' thing would be to async the inner for loop. Except that with maximum realistic values of m being 5, starting a thread for 5 iterations is nonsense.
So then my attention moves to the outer loop, with the thought that you can often split the range of a loop into subranges and run them in parallel.
But, firstly we don't know the range. Secondly, even iteratively, the dependency of each subsequent iteration upon the previous would require lockstepping the threadingeither through complex locking or queuingand with the minimal effort required for each step of each iteration versus the cost of locking and context switching, again make that a nonstarter.
Upshot: I think you called it earlier with your z = F(x); return F(z); comment that went over my head at the time. In the light of that, I've modified my first tentative rule from a post above to be:
For a recursive algorithmthat is not tail recursiveto be a suitable candidate for parallelisation, it requires that it has multiple, independant, recursive calls at each level.
The interesting thing now is: how many recursive algorithms fit that pattern?
m < 4 can be specialcased. Known formula exist for A(0,n), A(1,n), A(2,n) and A(3,n).
Indeed. The following returns all the tractable cases within the machine capability in a blink of an eye:
sub ackermann_c {
use constant INF => exp(~0 >> 1);
my( $m, $n ) = @_;
$m, $n=1 if $n == 0;
return $n + 1 if $m == 0;
return $n + 2 if $m == 1;
return 2*( $n+3)3 if $m == 2;
return 2**($n+3)3 if $m == 3;
return INF unless $m == 4;
my( $p, $r ) = ( $n+3, 2 );
$r = 2**$r while $p;
return $r  3;
}
But that now means I need to find a more suitable testcase for my experiments.
An inplace, parallised qsort maybe?
(*)Though wonder why you do a loop initialisation rather than my @n = (1 ) x $m; my @A = ( 1 ) x $m; given that m will realistically max out at 5. And whilst I'm on unimportant, purely subjective matters, why for(;;) not while( 1 )?
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] [select] 

[ Just a few rushed little comments ]
Thanks!
we don't know the range.
would require lockstepping the threading
with the minimal effort required for each step
I reached the same conclusions.
it requires that it has multiple, independant, recursive calls at each level.
That's why I used fib at first and called Ackermann a whole other ball game.
The interesting thing now is: how many recursive algorithms fit that pattern?
Any divide and conquer algorithm, for starters. That's where the dependency graph came in. I wanted to see if there was a split in it that could be exploited.
wonder why you do a loop initialisation rather than
I didn't know what I was going to end up with when I started. The code got refactored multiple times. I didn't microoptimise.
(Upd: Put differently, the snippet is a theoretical implementation, not a practical one. Work still needs to be done to make it practical. Well, as practical as Ackermann can be. )
why for(;;) not while( 1 )?
It's what the code I learned from used. I've never used while (1). I like for (;;) cause I read it as "for ever". Trivia: Since the entire bound check of while (CONSTANT) gets optimised away, and since "while" and Cstyle "for" use the same opcodes, while (1) and for (;;) are practically identical internally.
 [reply] [d/l] [select] 

