Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^9: GHC shuffle more efficient than Perl5.

by BrowserUk (Patriarch)
on May 06, 2005 at 04:18 UTC ( [id://454597]=note: print w/replies, xml ) Need Help??


in reply to Re^8: GHC shuffle more efficient than Perl5.
in thread Why is the execution order of subexpressions undefined?

Um, "overlapping computation and communication" isn't the point of concurrency *at all*.

For the sake of (avoiding an) argument let me add the following addendum to my original statement:

The basic premise of concurrency [on single processor systems] is to overlap communications with computations

On single processor systems, which are still the majority of business class systems in use, no amount of the pseudo-concurrency availed through multi-processing or threading, is going to speed up a cpu-intensive process.

With the former, task switching to another task performing a part of the same task or just to other system processes is likely to destroy cache coherency in addition to the costs of the scheduler's decision processes.

In the latter case, then you have the same scheduler costs. Context switches between threads may be less--but that assumes that when one thread stops, the next thread of the same process starts, which is a poor assumption most times.

Even if you could remove all other processes from the system--which you cannot--the cost of timeslicing and context switching come directly out of the bottom line of the overall process runtime. You'll get considerably better throughput from using a single-tasking kernel like DOS for truely cpu bound tasks.

Where concurrency can show benefits on multi-tasking single-cpu systems, is by overlapping computation on one thread whilst another waits for IO!

But with kernel threading, win32 or pthreads, such threading is very course grained. Every IO request usually causes the requesting thread to be suspended which leaves the process subject to a lottery as to whether it's other, cpu-intensive threads will be called in a timely manner such that they can make good use of the cpu whilst the IO-bound thread is wait-stated. It is much more likely that some other system or application process will get the next timeslice(s), and so realising the possibilities of increasing performance through overlapping, is a hit & miss affair.

Many languages offer user-space threading--Java, Ruby, Haskell, OCaml, Erlang etc.--but when these are running atop a general purpose multi-tasking OS, no matter how many user-threads are started within a process, they will all share the same timeslices. They all run as a single system thread. When any one of those user-threads make a blocking IO request, the entire process is suspended until the IO completes. And worse, on a multi-cpu system, a process with a single kernel thread, regardless of how many user trheads it runs, will never be able to take advantage of concurrency across the multiple cpus. It will only be able to run on a single cpu at a time.

It's at this point (in history!) that things start to get complicated. As Moore's Law runs out, the only way forward to greater throughput, short of the still mythical quantum effect cpu, is to place more concurrency in the silicon with hyperthreading and multi-core processors.

I contend that, short of the type of multiply & add cpu-intensive processing used in meterological simulations and similar numerical processing--which I don't think Perl, even Perl 6, is likely to be the tool of choice for--the use of threading will need to become ubiquitous to make best use of the hardware.

Threading is generally considered hard, but when you compare the benefits of being able to write single-flowpath code, with the alternatives of select polling or POE-style break-everything-into-little-pieces-with-global-state state-machines, or start an AIO request and then do a bit and see if it's ready, do a bit and see if it's ready AsynchronousIO coding; it's a doddle.

The only complexity is the synchronisation of dataflow between cooperating threads. This is hard, but only because the tools available for doing it are still so low-level. If every program that needs to access files on disk had to chase inodes and sector maps, diskIO would still be hard too. Likewise, if you had to load and unload FPU registers yourself; or bit-blit your own graphics, or do TCP by writing IP packets manually.

What's required is a higher-level encapsulation of the hard, but well understood requirements of inter-thread communications.

I think that the successful 21st century language will be the one that allows the programmer to write complex apps in discrete chunks that do one thing very well and efficiently and then chain those chunks together and allow the compiler or interpreter to take care of ensuring they communicate in a correct and timely fashion.

Several languages, mostly FP or FP-inspired have made some good headway in tackling these problems with some innovative and even intuative constructs. But from my brief tour of them, they mostly provide for user-space threading only, which is great for simulations, but is less good when you realise that every time you perform IO on any of your user threads, you throw away the rest of your timeslice for all of them. And you can't make concurrent use of multiple processors.

I also do not see FP-style syntax making it into the mainstream. PHBs and codemonkey's alike are too enamoured with OO-style procedural code.

I think that we need a OO/procedural language that provides the programmer with access to transparent threading coordinated through declarative mechanisms.

I believe that to be effective, it must support both kernel threads *and* user threads.

  • The kernel threads to allow it to utilise hardware-level concurrency to the maximum, by having a few, long-lived threads--say 1 per CPU/core/hyperthread (plus one or two for IO waits).
  • User threading to allow threads to be used lightly (in both senses of that phrase), whenever it makes sense to.

There was some discussion elsewhere recently about coroutines. The arguments made for coroutines in the best explaination of them I have ever seen, I whole-heartedly agree with. My only argument is how you achieve them.

As I understand it, Parrot intends to provide for coroutines by using the CPS. I think that this is the wrong approach as it imposes the not inconsiderable costs of CP on the whole program (I think). You effectively have to take snapshots of interpreter's state when entering (and leaving?) every called routine, throughout the entire program, because you can never know where in the code a routine may decide to backtrack to an earlier state or what routines (Parrot level) may participate as coroutines (at the HLL level).

A user thread is effectively a snapshot of the interpreter's state also. But only of that part of the program--the threaded routine--that starts the thread. It doesn't need to remember anything prior to it's coming into being, and no other thread needs to remember anything about it. Each thread is a self-contained continuation--except when they communicate. And that is where the right HLL construct could make the breakthrough that would allow threading to become Just Another Tool in the programmers toolbox, rather than something nasty and hard and scary as they currently tend to be viewed.

Declarative user-threads, with declarative dataflow would make using cooroutines, that migrated across hyperthreads, multi-cores & multi-cpus as demand required without the programmer even considering them, easy.

I'd like to see Perl 6 become that successful 21st century language that makes best use of the next generation of hardware. But for that to happen, I think it will need rather more than a thin veil wrapped over the abysmal threading tools that pthreads provide.

I would also like to think that I have gathered enough bits and pieces from enough places, to put together the bones of a design that would provide all of the above in a programmer-friendly & multi-cored/multi-processored future-machine-friendly manner, but the vagaries of NIH and WTFDHTHI mean it will probably never get anywhere near Perl 6.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about question the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
  • Comment on Re^9: GHC shuffle more efficient than Perl5.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://454597]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 15:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found