http://www.perlmonks.org?node_id=868162


in reply to Re^13: Utter FUD!
in thread is ||= threadsafe?

Erlang "threads" are also called Erlang "processes" and they neither have shared data/state nor wholesale copies of data. I don't often see them referred to as just "threads", probably because they aren't much like C, nor C++, nor Java, nor Modula-2, nor Visual BASIC, nor Unix kernel threads nor Win32 native threads (nor are they much like iThreads nor fork).

You can create and destroy Erlang "processes" similar to what one might be used to with threads in a procedural language (since Erlang is functional but also is all about interfacing with things in the Real World).

Haskell has more emphasis on the "functional" (the meaning from computing theory, not plain English) and less on the "interfacing" and so "threads" there tend to be more just purely useful for using more CPUs at once in hopes of finishing sooner and you don't so much control the "threads" as you declare where the compiler is allowed to make use of threads for you.

Purely functional programs don't have a traditional flow, traditional "blocking" operations, nor variables much less shared variables, so traditional threads just don't really apply much.

A lot of times you'll see things called "threads" with some qualifier(s) and then described as "not the same as 'vanilla' threads" and then called "light-weight processes". Unfortunately, you can't call iThreads "light-weight processes" since in some significant ways they weigh more than vanilla processes.

So, iThreads are actually more like fork than like any of these things that are sometimes calls "threads" in other languages. And the things that aren't pretty much exactly like C threads and Unix kernel threads don't tend to get called just plain "threads" much, IME.

Though, their emulation of the lion's share of work done by fork() (copy of the majority of the process, not the myriad bookkeeping bits like setting the program counter or assigning PIDs, etc.) is significantly less efficient than fork().

iThread's copying used 10x more CPU in the trivial case. It was trivial to create some data and make iThread's copying use 100x the CPU of fork(). Even if I pessimize for fork() by modifying all of the initial data so it all gets unshared, iThreads still used just over 70x more CPU.

Comparing memory usage is not trivial so I didn't try to come up with any numbers to compare that.

But that only applies to (part of) why I don't use iThreads in Unix. I look forward to trying to use iThreads again under Windows.

The name iThreads has probably discouraged use of the technology. I find many eschew threads, often in a rather stark "threads vs fork" mindset. Well, iThreads have more in common with multi-tasking via fork than with traditional multi-tasking via threads, so an ardent "fork not threads!" stancer should well consider iThreads, certainly before threads.

I tend to focus more on the details of communication between the parts (solid interfaces lead to solid systems) and so don't tend to reach for the convenient "share a few variables willy, nilly" framework. But iThreads have advantages and can be used effectively even in Unix (yes, you usually need to be aware of their disadvantages; for example, don't spawn a new thread for each little task).

- tye        

Replies are listed 'Best First'.
Re^15: Erlang--the facts!
by BrowserUk (Patriarch) on Oct 28, 2010 at 23:59 UTC
    Erlang "threads" are also called Erlang "processes" and they neither have shared data/state nor wholesale copies of data.

    Scant, simplistic and largely inaccurate since the release of the R11B in 2006. Rather more inaccurate since the release of R13B.

    Erlang "processes" and Erlang "threads" are entirely different beasts. Indeed, there is no such concept as an Erlang thread as such. And, like Java green threads, (and Coros) Erlang processes were (and still are, but I'll get back to that), entirely user-space entities and as such are neither processes nor threads in the conventional (OS) sense.

    However, in circa. 2004, the lack of SMP scalability was recognised as a significant limitation, and development was started to address that which culminated in the R11* releases of the VM. The approach taken was to start one (kernel) thread per core, feeding off a single shared (note that word) queue (and that one also). Each thread is a separate interpreter that take messages off of the shared queue and executes them until they either a) finish; b) block; c) error.

    Now it was quickly realised that the shared queue (and the associated locking) was a significant drag on performance, so having got it working, they set about improving the performance. To this end, they developed the R13B VM which uses separate queues for each interpreter, thus avoiding (some) of the lock contention. To achieve this, they had to add "process migration logic". That is Erlang "processes" not OS processes. And "migration logic", means moving "processes" to other queues if the current queue has more than some pre-configured maximum number of "runnable processes" (Again; Erlang "processes", not OS processes!).

    Now back to your "no wholesale copying of data". As Erlang is a functional language--with immutable variables--every time you send a message to a "process" that causes it to (for example) append a character to a string; or push to an array; or add, change or remove a key/value pair to a hash; or add, remove or (say) reverse the order of elements within a list; it (at least notionally) copies the entire data structure.

    Of course, we know that in reality such copying is impractical in the real world, and like (for example) Haskell, that notional immutability is enforced at the language level, but is done by "smoke&mirrors" at the implementation level. So, Erlang's "message queues" are basically, simply linked-lists of heap-allocated memory structures (as might be used in C (I wonder what language Erlang is implemented in?)). In other words--shared state at the OS level.

    And, should you doubt any of this, please download and read: this pdf

    Now, does any of that sound familiar?

    One thread per core. Queue(s) to facilitate communications. The absence of direct access to shared state. Internal locking.

    Does that sound anything like the iThreads model I've been taking about?

    I chose Erlang as one of my examples, because I happen to have made a something of a study of it.

    So, iThreads are actually more like fork than like any of these things that are sometimes calls "threads" in other languages.

    Congratulations on dropping the phrase "fork emulation". Threading in Erlang is quite different from threading in C. Why should threading in Perl have to be the same?

    And doesn't the above, (or the pdf if you bothered) sound a lot like the very type of thread-pool + queues mechanism I (amongst other) have been advocating here for years?

    I tend to focus more on the details of communication between the parts (solid interfaces lead to solid systems) and so don't tend to reach for the convenient "share a few variables willy, nilly" framework. But iThreads have advantages and can be used effectively even in Unix

    If I didn't know better, I'd suggest that we might be singing from the same song sheet--though perhaps with different accents.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Erlang "processes" and Erlang "threads" are entirely different beasts. Indeed, there is no such concept as an Erlang thread as such.

      Heh. That's funny.

      Thanks for the history "lesson". No surprises there (nor anything that contradicts what I said).

      Yeah, passing messages often copies data. If a data structure were a process, then passing a message might be considered like fork()ing. But a data structure isn't a process and Erlang's copying of data is very much not like fork()ing.

      One thread per core. Queue(s) to facilitate communications. The absence of direct access to shared state. Internal locking.

      Does that sound anything like the iThreads model I've been taking about?

      "One thread per core" is certainly possible with iThreads. Nothing in iThreads says "use one per core", except, of course, the significant overhead (indeed, if doing I/O-bound operations using iThreads instead of non-blocking I/O, you probably need to "fiddle" with the number of threads to use in order to not be sitting idle too much while also not wasting way too many resources creating extra threads).

      I don't believe "Queue(s) to facilitate communications" has even come up so far in this conversation. I don't believe iThreads uses queues. You can certainly use threads::queue with iThreads, of course.

      "The absence of direct access to shared state" is done via quite different means, of course, between Erland and iThreads.

      "Internal locking" is done by threads::shared and not much by iTheads itself.

      Threading in Erlang is quite different from threading in C.

      No, somebody told me Erlang doesn't have threads.

      Why should threading in Perl have to be the same?

      What strawman made that demand? But then, when I mentioned 'Erland "threads"' even using scare quotes, you chided me that they aren't "threads" so I guess you are making the demand that they can't be called "threads", presumably because they aren't like C threads.

      In sum, Erlang threading (as I noted) isn't much like C threads, to the point that people don't usually call them "threads" (without qualifiers), and isn't much like fork().

      Meanwhile, iThreads are also not much like C threads and calling them just "threads" has lead to them being misunderstood resulting in some significant resistance to their use.

      And iThreads don't use internal locking nor queues. Threading was bolted onto the side of Perl after Perl had existed for many years. Attempts to use internal locking proved hopelessly unstable.

      So they went a different route and avoided the whole "shared" route and instead emulated fork() by wholesale copying all state information and user data at "thread" creation time so each instance has a separate copy and no locking is required. Of course, they couldn't emulate fork() even close to as efficiently as a real fork().

      Then they provided an add-on that ties variables so that each time you read a threads::shared variable a lock is held while the shared value is copied over to the particular instance. And they provided an add-on that implements a queue that can be used across instances.

      And doesn't the above, (or the pdf if you bothered) sound a lot like the very type of thread-pool + queues mechanism I (amongst other) have been advocating here for years?

      I don't know. I guess I haven't really been paying that much attention to you. I've noticed you post a lot of code that uses threads over the years. I don't recall having read a description of what you were doing. Code tends to have a ton of necessary details and, lacking a guiding description, I never noticed a particular pattern to your many code samples.

      In any case, I was not aware of you having advocated some particular "mechanism". Did you think I was criticizing your mechanism? Is that why you've been acting so?

      I'd suggest that we might be singing from the same song sheet

      No, I'm clearly just playing political games, solely trying to scare people. You can tell that my arguments are simply FUD because, being political arguments, I'm prone to posting them as root nodes in non-SoPW sections so that the impact of the propaganda isn't diluted and because I use scary words like "monster".

      - tye        

        Sir. You dissemble.