Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: what the history behind perl not having "real" threads

by dave_the_m (Parson)
on Feb 25, 2013 at 12:24 UTC ( #1020507=note: print w/ replies, xml ) Need Help??


in reply to what the history behind perl not having "real" threads

Perl has had two threading implementations: "5.005 threads" and "interpreter threads". They both use the OS's underlying threading facilities; they differ in whether perl data structures are shared by default.

5.0005 threads (introduced with perl 5.005) by default shared all data and data structures. This turned out to be almost impossible to make thread-safe, since almost any perl-level "read" operation can actually end up modifying an SV (scalar value). For example:

my $x = 1; print $x; # whoops $x has been modified: converted from int to string $y = \$x; # whoops $x has been modified: its ref count has increased.

To get this to work right would involve locking before just about any operation. So that threads model was abandoned.

There was a separate effort to allow fork emulation under Windows (which doesn't support fork()). This worked by collecting all perl's state into a single interpreter struct and allocating all SVs from per-interpreter pools. When fork() was called, the interpreter and all its SVs etc etc would be copied, and a new thread created which ran using that new data. So each "process" (actually just a thread) had its complete own copy of everything and could run independently without affecting any other threads; no (or very little) locking required. This first appeared with 5.6.

Then someone had the idea of exposing this interface at the perl level (rather than just via fork() under windows). Thus was born the threads.pm module, which did a similar thing to the fork (cloned the current state), but started the new thread with fresh code rather than running from the same point as the caller (a la fork()). Someone also added threads::shared, which via a mechanism similar to tying, allowed data structures to be shared across threads. These came out with 5.8.

Dave.


Comment on Re: what the history behind perl not having "real" threads
Download Code
Re^2: what the history behind perl not having "real" threads
by perl-diddler (Hermit) on Feb 26, 2013 at 06:59 UTC
    dave the m wrote:
    5.0005 threads (introduced with perl 5.005) by default shared all data and data structures. This turned out to be almost impossible to make thread-safe, since almost any perl-level "read" operation can actually end up modifying an SV (scalar value).
    Is this required by the language, or, evolving from your examples:
    my $x = 1; print $x #don't care if modified! our $package_X; print $package_x; #Now I care!
    I wouldn't see a simple 'my' var as needing sharing, unless you take a reference to it..., package vars might be ideal for something like a Fortran COMMON section, if I remember what I'm talking about... i.e. GLOBAL vars/package (that would be shared).

    But if I print $x, does it have to modify "$x", or -- rather why not leave it alone and have print modify a copy -- it's not like it is being stored somewhere that a shared implementation might expect to be able to access it's 'mutated form' ;-).

    As for your 3rd line, referring to the ref count, that's definitely something the interpreter would need to track, but wouldn't be hard to implement on the x86 as, as long as the counter is arch-word (32/64bit) aligned, an inc/dec operation is atomic.

    The thing that is annoying about the current model is, from my understanding, the limitation on having to pre-declare something as shared or not -- which would, it seems, preclude using it with object oriented programming where specific objects could have global state (and need locking in the presence of multiple writers) -- but not multiple readers.

    But the good news, as I understand you saying, is that the current code uses native OS threads -- it's just that they don't share much [if any] data...that's slightly better than I thought it might be given that under linux today, a fork-exec you can choose multiple levels of sharing and code segments of compiled programs can automatically share the same code memory (presuming they weren't built statically).

    Thanks for the info....

      Is this required by the language
      There's nothing in the language that precludes a 5.0005-style threading implementation. The difficulty was in retrospectively trying to make the existing implementation thread-safe, where it had never been designed for that possibility. This is one of the (many) reasons why it was concluded that a complete from-the-ground-up rewrite of the perl interpreter was required, i.e. perl6.

      The main drawbacks of the ithreads model are: that cloning the existing interpreter when creating a new thread is slow; that it uses lots of memory, since the new interpreter doesn't make any use of the OS facilities that a fork() would, of sharing memory by default with copy-on-write pages; and that having shared variables is slow, clunky and is memory-heavy.

      Dave.

        Dave m wrote:
        There's nothing in the language that precludes a 5.0005-style threading implementation.

        So that print "$x", currently does modify $x to be a string, that might safely be called an implementation detail that wouldn't need to be kept for compatibility reasons.

        I think it's a shame that they gave up on the 5.0005 style threading. The linux kernel didn't become SMP safe or capable overnight. It started out with no SMP, and for a long time, lived with the 'big lock model', where user-land code could be multi-threaded, but by-and-large, the kernel was not. Going from 2.0->2.2-2.4 were long steps...and it took alot of developer education to go from 1 big lock to many smaller locks, and in many cases, non-locking models to reduce bus contention and going higher ordered algorithms to ones that approach O(1).

        As near as I can tell, it's an ongoing process. Certainly a redesign of the language could make the process easier, but I have no idea if there was a brick wall, or if some people were too risk aversive to live with something that was in constant evolution.

        From an end user-perspective, I never heard about user-level programs needing complete rewrites due to language changes, but at the driver level things were less stable -- not exactly chaotic, but certainly requiring ongoing work.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020507]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2014-09-16 22:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (51 votes), past polls