http://www.perlmonks.org?node_id=885876


in reply to Re: [OT]: threading recursive subroutines.
in thread [OT]: threading recursive subroutines.

First. Thanks for your considered response.

I think that this much quoted example of Haskell is actually something of a trompe d'oil. I'll try to explain that.

The reason for parallelising this would be to improve performance. But, the reality is that the first and best way of improving sort performance in Haskell, it to not use this cute and (still) hugely, visually impressive implementation.

The problem is, that whilst the concise clarity, and relatively easy to understand semantics of this list comprehension solution are an obvious advert for Haskell and a good teaching aid, it tricks the eye into believing that it is an efficient implementation. This is far from the case.

The problem is that given the huge disparity of speed between on chip and off-chip memory access, memory allocation and management costs are a significant factor in any cpu-intensive algorithm. And this implementation exacerbates those costs by effectively having to not just duplicate the memory allocations at every level--by creating two new list from each previous list. But it also has to concatenate these sublists back into (new) composite lists.

See the Wikipedia discussion for more detail. If you read to the end, you'll see that Haskell has now moved to using Merge sort. In part, in order to counter this problem.

the Haskell runtime is free to do those two sorts in parallel.

This is the crux of the threading recursion problem.

If the runtime decided to always perform the two sorts in parallel, then it would end up spawning one thread for every item in the list to be sorted. This is obviously not a practical solution.

That means that to (automatically) parallelise the algorithm, the runtime would need to decide how many levels deep into the recursion it should allow the parallelisation to go before resorting to straight forward recursion in order to bound that parallelisation.

Let's say that the runtime discovered how may threads of execution are available and can rewrite the algorithm such that once that limit has been reached, it resorts to simple recursion. The problem still remains that the used-only-once nature of Haskell variables is going to impose significant memory management costs upon the algorithm. In all probability, sufficient costs to negate most of the benefits of the parallelisation.

Given that a quick sort can be implemented to run in-place--albeit that Haskell requires the breaking of its own rules to so--the benefits of doing so and thus avoiding the memory management overheads, far outweigh the benefits of parallelising the list comprehension algorithm.

And once the quick sort is performed in-place, parallelising that algorithm is both easier, and produces significantly better performance improvement multiples.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^2: [OT]: threading recursive subroutines.