Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^4: Why Coro?

by binary (Novice)
on Oct 17, 2010 at 23:36 UTC ( [id://865853]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Why Coro?
in thread Why Coro?

Marc Lehmann is in fact spot on and Chip is incorrect. There's no "politics" involved here. At least not on Marc's side.

Perl's "threads" are not threads in the generally accepted term, yes wikipedia does say "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system."

But it also goes on to say

", but in most cases, a thread is contained inside a process."

Perl's implementation of "threads" is actually done using forking and there is *no* shared memory space. You're getting confused as what actually happens is *ALL* the data from the main process is copied across to the memory space of the "threads". They are processes and have their own memory space. Real threads execute inside the 1 process and share memory.

For interpreted languages there is what's called a GIL(Global Interpreter Lock) which is there to prevent non-thread safe code being shared with other kernel threads. Ruby and Python have a GIL and their threads are known as 'green threads' which are not kernel level but are after the GIL, but still share memory. They of course will not take advantage of multi-cores. On a side note you can get real kernel level threads with both Ruby and Python through jRuby and jPython, but nothing like that exists for Perl.

Perl's psuedo-threads will take advantage of multiple-cores because they are processes and they do NOT share memory. Marc knows his stuff. He wouldn't have been able to write Ev, AnyEvent, Coro to be as stable and fast as they are if he was lacking in such basic knowledge that even *I* know and I consider myself intermediate at best.

But, please don't take my word for it. Check perlthrtut. First paragraph.

"This tutorial describes the use of Perl interpreter threads (sometimes referred to as ithreads) that was first introduced in Perl 5.6.0. In this model, each thread runs in its own Perl interpreter, and any data sharing between threads must be explicit. The user-level interface for ithreads uses the threads class."

Notice the word 'explicit'. That means you have to, yourself share any data between 'threads'. This is the very purpose of threads::shared. Why would that module exist if Perl 'threads' shared memory as Chip Salzenberg and yourself claim?

Replies are listed 'Best First'.
Re^5: Why Coro?
by BrowserUk (Patriarch) on Oct 18, 2010 at 00:16 UTC

    Oh dear. If misreading perlthrtut is the extent of your knowledge, you really shouldn't even reach mental conclusions on the subject, never mind offer them up in writing as proof of your own ignorance.

    How could what you are saying possibly be true, when I and 100s of others run ithreads on Windows everyday. Windows doesn't have a fork api.

    Here, check for yourself. You just might learn something useful. (Like the fact that on Windows Perls, ithreads are used to emulate fork--not the other way around.)

    (Hint: that's not a question, the answer is obviously that it isn't, and couldn't possibly be, true.)

    You (and Marc Lehmann) need to understand the difference between a) a program model; b) the actual implementation that underlies that model.

    Coro isn't "threading". It's good, old-fashioned, cooperative coroutines--just like Windows 3.0; with all the same problems--and nothing more.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re^5: Why Coro?
by juster (Friar) on Oct 19, 2010 at 00:52 UTC

    Threading is an interesting subject. This is what I have learned from peeking at the source code (I think it was perl version 5.12). I also think BrowserUk's treatment is a little harsh but he most likely has good reason to be if he had to program windows before it was pre-emptive :).

    Perl does use kernel threads (ie pthreads, Win32 threads). The code is located at dist/threads.xs inside the perl source tarball. The confusing part is it uses threads to model processes. This is not so confusing when you consider the source of the new perl threads being for fork() emulation on windows. Interestingly, python does use actual lightweight kernel threads for it's green threads. Yet only one thread runs at a time, like you say with the GIL.

    So when Mark Lehmann says his Coro module are the "only real threads" and that perl's threads aren't real threads. Well... defining what a real thread is kind of confusing. Perl threads are "real" kernel threads. In my limited experience they perform like processes and give about the same performance as perl's fork() or python's "multiprocessing" module (which uses python's internal fork()).

    The wikipedia entry mentions the user-level model of threading ("N:1" under "Models") so does this means that Coro's coroutines are indeed threads? They just happen to be user-level threads. Coro's "threads" perform closely to Python or Ruby's "threads" (which are also coroutines, user-level threads).

    I think Coro is really neat and think it's mainly useful when you need to model your program asynchronously with many little workers who share a large amount of data. Perl threads are also really neat when you don't have to share a great deal of data. I think most of the frustration is in poor use of terminology. They are both threads... just different types.

      Well... defining what a real thread is kind of confusing.

      Actually, it's not. A "thread" is a schedulable unit of execution context. Thereby making kernel threads like Windows threads and pthreads--as used by ithreads--real threads. (The 'real' is redundant.)

      It also makes some user-space implementations--such as found in Java 1.1, Erlang, and others--that implement their own internal scheduler, also threads.

      But coroutines are not threads. They are coroutines.

      I think Coro is really neat

      I also think Coro is extremely clever code. And its author, an extremely clever coder. There have even been a few occasions when I have sorely wished that Coro ran on my platform. There is no reason it shouldn't. The basic, underlying longjump mechanism works natively just fine--it is used for exception handling. It's just the implementation that prevents it.

      And my recognition of the author's skills and knowledge are what makes me think that his diatribe in the Coro POD, is neither ignorance nor confusion> But simple politicking of the worst kind. Done in the full knowledge and aforethought of malice, that it is both factually incorrect, and likely to lead some--like binary perhaps--into confusion.

      I think that if there is any real confusion, it comes because Linux treats threads and processes very similarly. To the extent that some versions of top actually list the threads of a single process as if they were separate processes.

      To quote

      Threads of execution, often shortened to threads, are the objects of activity within the process. Each thread includes a unique program counter, process stack, and set of processor registers. The kernel schedules individual threads, not processes. In traditional Unix systems, each process consists of one thread. In modern systems, however, multithreaded programs—those that consist of more than one thread—are common. As you will see later, Linux has a unique implementation of threads: It does not differentiate between threads and processes. To Linux, a thread is just a special kind of process.

      The thing that makes them "special", is that they share address space. Perl's threads also share address space at the C level.

      It is the programming model that ithreads layers on top of those underlying kernel threads, that restricts the access of individual threads within the process, to subsets of the full memory allocated to that process.

      It does this by segregating memory allocations made by different threads, to different segments ("arenas") of the memory allocated to the process. But it is only Perl and the threading model chosen, that enforces this segregation; not the OS. Indeed, the segregation is quite easily defeated.

      The choice of an 'explicitly-shared only' model was a) a concious choice; b) done with very good reason.

      And IMO c) will in the longer term be seen as both inspired, and "the way to go".

      The current implementation lets it down somewhat because of its memory-hungriness, and (lack of) speed. But this could (and hopefully, soon will be) addressed. The main problem with the current implementation is that is uses a 'double-tieing' mechanism for the scalars held in shared aggregate structures.

      That is to say, both the AV or HV of a shared structure, and the individual scalars they contain, have attached magic. This means that not only is the size of every aggregate-held scalar, inflated in size by the attached magic, but also that each thread that has visibility of the shared structure, also requires a--relatively lightweight, but still significant--place-holder or alias object to every scalar held in the shared structure. This is both quite costly--and unnecessary.

      The scalars that live within a tied aggregate don't need to have individually attached magic. (Nor even any physical storage allocation, but that's a twist that we can skip for now.) When a FETCH or STORE is invoked upon a tied array ot hash, the magic attached to the AV or HV has enough information to read or write the actual element without requiring further magic be attached to each individual scalar.

      Not only would the removal (or rather the avoidance of attachment) of magic to the individual scalars considerably lessen the size of the shared aggregates, it would also remove the need for per-thread place-holders for them also. So, each thread would retain a single, lightweight reference to the shared AV or HV, and access it contents through that via it's attached magic, with the result that the memory cost of the shared aggregate is further reduced.

      The final icing on the cake is that indirecting through only one level of magic instead of two would considerably speed up accesses.

      In a nutshell, you can wrap a class around an aggregate with having to make the individual elements of that aggregate objects in their own right. And the memory and performance saving of that are legion. And this could (and will if I ever master the intricacies of XS) be implemented now.

      But none of this detracts from either the desirability of preventing the unintentional, accidental sharing of thread-specific data; nor the usability of the current implementation. Just as with regexes (and every other aspect of Perl, and other languages), implementations can be improved, incrementally over time. Provided that the basic programming model is right.

      And (IMO) the ithreads model is.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Actually, it's not. A "thread" is a schedulable unit of execution context. Thereby making kernel threads like Windows threads and pthreads--as used by ithreads--real threads. (The 'real' is redundant.) It also makes some user-space implementations--such as found in Java 1.1, Erlang, and others--that implement their own internal scheduler, also threads. But coroutines are not threads. They are coroutines.

        Coro implements its own scheduler. By this definition a "coro" created with Coro's "async" call is a "thread". It contains its own execution context which can be suspended and resumed. I still do not understand why coros created by Coro are not user-space threads in your terminology?

        Coro has much more functionality than the classical definition of the coroutine. To implement a coroutine you allow multiple exit points of the coroutine for yielding back to the caller. This is very simple compared to the full functionality of Coro which may be better described as a system of fibers.

        The only distinction between coroutines and fibers seem to be that coroutines are built into the language itself and are much simpler. All it takes to implement coroutines is a yield keyword that stops execution, returns a value, and resumes execution after the yield. Coro now seems less like a coroutine because it cannot cede (Coro's yield) any values to its caller. Coro also implements more sophisticated operations than the traditional coroutine. It seems to me more like fibers. More hair splitting... sigh.

        Fibers multitask cooperatively. User-level threads use user-level scheduling, which may may very well be cooperative multi-tasking... confusing! GNU Portable Threads (ironic misnomer?) is an example user-level thread library that uses cooperative multi-tasking and switches context instead of I/O blocking. Yes it has threads in its name... argh. This implementation also seems similar to Coro which uses cooperative multi-tasking and yields control instead of blocking on I/O. One difference is that Portable Threads can also schedule its "threads" based on priority.

        Are fibers threads? The wikipedia entry starts off with:

        In computer science, a fiber is a particularly lightweight thread of execution.

        Sheesh... I can see you are adamant in your terminology but my task of translating between everyone's disparate definition of what a "thread" really is, is confusing.

        After a quick skim of the abstract confusions, enter the tangible! The fact that perl's ithreads have created a bunch of little pseudo-processes out of the threads! I'm not sure I'm convinced how forward-thinking it is to model processes by using threads. "Real" (OS-level) Processes already must explicitly share data. Processes have private data. Processes run their own interpreter. Process run in parallel. Why reinvent the wheel?

        Add in the escher-esque quality of starting an OS process to run perl, which creates a main OS/kernel thread... now perl's ithreads creates more OS/kernel threads which in turn act like separate perl processes. ithreads have their own interpreter, they share data through complex operations akin to IPC which means they are as fast as processes sharing data. This seems more like going backwards than forwards.

        This is what Coro's author is apparently talking about in his short description: "Coro - the only real threads in perl". So what is the "real" thread? In my own mind they are both "real" threads with Coro leaning towards a fiber and perl ithreads leaning towards a process.

        Your quote on linux threads is a good example of threads in a specific context. Namely the linux operating system and its process model. I was trying to nail down the definition of thread in a wider, abstract context. Namely all of computer science. Just about the only decent source I could find was wikipedia.

        The rest of your post about aggregator magic was very interesting!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://865853]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-26 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found