Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^6: GHC shuffle more efficient than Perl5.

by audreyt (Hermit)
on Apr 23, 2005 at 08:01 UTC ( #450659=note: print w/ replies, xml ) Need Help??


in reply to Re^5: GHC shuffle more efficient than Perl5.
in thread Why is the execution order of subexpressions undefined?

Thanks for your kind words. In light of this thread's topic, I'd like to add that, because functions in pure FP never have side-effects, their reduction order can be entirely undefined. Actions defined by those pure functions, however, are defined as always sequential when main is executed.

This fact makes concurrency and automatic parallelization much, much easier to reason about; as an example, you can trivially write deadlock-free, composable, highly efficient concurrent code, using shared variables and inter-thread channels, without worrying about side effects going astray.

See http://homepages.inf.ed.ac.uk/wadler/linksetaps/slides/peyton-jones.ppt for more information... I think it will be enlightening :-)


Comment on Re^6: GHC shuffle more efficient than Perl5.
Re^7: GHC shuffle more efficient than Perl5.
by BrowserUk (Pope) on Apr 23, 2005 at 08:54 UTC
    ... because functions in pure FP never have side-effects, their reduction order can be entirely undefined. Actions defined by those pure functions, however, are defined as always sequential when main is executed.

    Yes. In one sentence you have summed up the requirement for defined EO in languages, or those parts of a langauge, that have side-effects. Where it gets awkward, is to see how that enables the parallelisation of functions that can have side-effects, when they are a part of the same expression.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
Re^7: GHC shuffle more efficient than Perl5.
by BrowserUk (Pope) on Apr 23, 2005 at 10:29 UTC

    I read the STM slides you linked and I have a follow up question:

    Slide 26 says "STM is aimed at shared memory [applications]..." and slide 31 says "You can't do IO in a memory transaction...", which leads me to ask, what is the point of concurrency that cannot do IO?

    The basic premise of concurrency is to overlap communications with computations. You utilise the time spent communicating to do something else useful.

    In the case of the example used in the slides, that of a banking system, the instructions for increasing and decreasing an account's total would, in the real world, be coming from external systems--clearing houses, ATM's, check processing in branch work rooms etc.--and any but the smallest bank would have to be using multiple (many) systems to process the volumes of transactions and accounts in real time.

    If STM is restricted to dealing only with in-memory concurrency, it seems to be of little real-world application beyond simulations?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
      Well, the trick is you can go into and out of STM monad from the IO monad at any time, since STM is essentially a subset of IO. So you enter STM when concurrent atomicity is required, and do the real IO (say, writing to screen) outside it.

      But it is true that, although STM can automatically scale over SMP settings, it still assume essentially a shared memory model; that is why it's called a concurrency tool instead of a (cross-machine) parallelizing tool, which has other fault-tolerance factors to consider.

      However, for its targetted use (that is, a compelling replacement over select loops and thread locks/semaphores), STM is still damn useful.

      Um, "overlapping computation and communication" isn't the point of concurrency *at all*. The point of concurrency is that transistors (and thus parallel computation) is incredibly cheap, but that specifying a program as "do X *then* do Y (etc)" forces all those transistors to sit idle waiting for X to complete before they can start doing Y. That's of tremendous real-world importance. The difficultly of writing parallel applications is the *only* reason we don't all have 8 or so processors on our machines and so execute our "real world apps" 1-8 times faster. Instead we spent a tremendous amount of transistors "guessing" what might be done next, because we can't "actually" do Y until X has finished.
        Um, "overlapping computation and communication" isn't the point of concurrency *at all*.

        For the sake of (avoiding an) argument let me add the following addendum to my original statement:

        The basic premise of concurrency [on single processor systems] is to overlap communications with computations

        On single processor systems, which are still the majority of business class systems in use, no amount of the pseudo-concurrency availed through multi-processing or threading, is going to speed up a cpu-intensive process.

        With the former, task switching to another task performing a part of the same task or just to other system processes is likely to destroy cache coherency in addition to the costs of the scheduler's decision processes.

        In the latter case, then you have the same scheduler costs. Context switches between threads may be less--but that assumes that when one thread stops, the next thread of the same process starts, which is a poor assumption most times.

        Even if you could remove all other processes from the system--which you cannot--the cost of timeslicing and context switching come directly out of the bottom line of the overall process runtime. You'll get considerably better throughput from using a single-tasking kernel like DOS for truely cpu bound tasks.

        Where concurrency can show benefits on multi-tasking single-cpu systems, is by overlapping computation on one thread whilst another waits for IO!

        But with kernel threading, win32 or pthreads, such threading is very course grained. Every IO request usually causes the requesting thread to be suspended which leaves the process subject to a lottery as to whether it's other, cpu-intensive threads will be called in a timely manner such that they can make good use of the cpu whilst the IO-bound thread is wait-stated. It is much more likely that some other system or application process will get the next timeslice(s), and so realising the possibilities of increasing performance through overlapping, is a hit & miss affair.

        Many languages offer user-space threading--Java, Ruby, Haskell, OCaml, Erlang etc.--but when these are running atop a general purpose multi-tasking OS, no matter how many user-threads are started within a process, they will all share the same timeslices. They all run as a single system thread. When any one of those user-threads make a blocking IO request, the entire process is suspended until the IO completes. And worse, on a multi-cpu system, a process with a single kernel thread, regardless of how many user trheads it runs, will never be able to take advantage of concurrency across the multiple cpus. It will only be able to run on a single cpu at a time.

        It's at this point (in history!) that things start to get complicated. As Moore's Law runs out, the only way forward to greater throughput, short of the still mythical quantum effect cpu, is to place more concurrency in the silicon with hyperthreading and multi-core processors.

        I contend that, short of the type of multiply & add cpu-intensive processing used in meterological simulations and similar numerical processing--which I don't think Perl, even Perl 6, is likely to be the tool of choice for--the use of threading will need to become ubiquitous to make best use of the hardware.

        Threading is generally considered hard, but when you compare the benefits of being able to write single-flowpath code, with the alternatives of select polling or POE-style break-everything-into-little-pieces-with-global-state state-machines, or start an AIO request and then do a bit and see if it's ready, do a bit and see if it's ready AsynchronousIO coding; it's a doddle.

        The only complexity is the synchronisation of dataflow between cooperating threads. This is hard, but only because the tools available for doing it are still so low-level. If every program that needs to access files on disk had to chase inodes and sector maps, diskIO would still be hard too. Likewise, if you had to load and unload FPU registers yourself; or bit-blit your own graphics, or do TCP by writing IP packets manually.

        What's required is a higher-level encapsulation of the hard, but well understood requirements of inter-thread communications.

        I think that the successful 21st century language will be the one that allows the programmer to write complex apps in discrete chunks that do one thing very well and efficiently and then chain those chunks together and allow the compiler or interpreter to take care of ensuring they communicate in a correct and timely fashion.

        Several languages, mostly FP or FP-inspired have made some good headway in tackling these problems with some innovative and even intuative constructs. But from my brief tour of them, they mostly provide for user-space threading only, which is great for simulations, but is less good when you realise that every time you perform IO on any of your user threads, you throw away the rest of your timeslice for all of them. And you can't make concurrent use of multiple processors.

        I also do not see FP-style syntax making it into the mainstream. PHBs and codemonkey's alike are too enamoured with OO-style procedural code.

        I think that we need a OO/procedural language that provides the programmer with access to transparent threading coordinated through declarative mechanisms.

        I believe that to be effective, it must support both kernel threads *and* user threads.

      • The kernel threads to allow it to utilise hardware-level concurrency to the maximum, by having a few, long-lived threads--say 1 per CPU/core/hyperthread (plus one or two for IO waits).
      • User threading to allow threads to be used lightly (in both senses of that phrase), whenever it makes sense to.

        There was some discussion elsewhere recently about coroutines. The arguments made for coroutines in the best explaination of them I have ever seen, I whole-heartedly agree with. My only argument is how you achieve them.

        As I understand it, Parrot intends to provide for coroutines by using the CPS. I think that this is the wrong approach as it imposes the not inconsiderable costs of CP on the whole program (I think). You effectively have to take snapshots of interpreter's state when entering (and leaving?) every called routine, throughout the entire program, because you can never know where in the code a routine may decide to backtrack to an earlier state or what routines (Parrot level) may participate as coroutines (at the HLL level).

        A user thread is effectively a snapshot of the interpreter's state also. But only of that part of the program--the threaded routine--that starts the thread. It doesn't need to remember anything prior to it's coming into being, and no other thread needs to remember anything about it. Each thread is a self-contained continuation--except when they communicate. And that is where the right HLL construct could make the breakthrough that would allow threading to become Just Another Tool in the programmers toolbox, rather than something nasty and hard and scary as they currently tend to be viewed.

        Declarative user-threads, with declarative dataflow would make using cooroutines, that migrated across hyperthreads, multi-cores & multi-cpus as demand required without the programmer even considering them, easy.

        I'd like to see Perl 6 become that successful 21st century language that makes best use of the next generation of hardware. But for that to happen, I think it will need rather more than a thin veil wrapped over the abysmal threading tools that pthreads provide.

        I would also like to think that I have gathered enough bits and pieces from enough places, to put together the bones of a design that would provide all of the above in a programmer-friendly & multi-cored/multi-processored future-machine-friendly manner, but the vagaries of NIH and WTFDHTHI mean it will probably never get anywhere near Perl 6.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about question the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://450659]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-07-29 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (226 votes), past polls