http://www.perlmonks.org?node_id=1018835

mulli has asked for the wisdom of the Perl Monks concerning the following question:

I am reading up on using threads and in the past I know there have been issues with stability, speed, etc. So I need to get a few things cleared up.

1. The most recent documentation for threads.pm states that variables are by default thread local? I have also read that everything gets copied over to a new thread. Which is it?
2. I am reading that sharing data between threads is slow. Is this true? One of the main purposes for my application will be to share data between worker threads.

Replies are listed 'Best First'.
Re: Perl thread confustion
by BrowserUk (Patriarch) on Feb 15, 2013 at 03:54 UTC

    1. The most recent documentation for threads.pm states that variables are by default thread local? I have also read that everything gets copied over to a new thread. Which is it?

      Both!

      All lexical (my) variables are local to the thread they are declared in. Unless they are:

      • Explicitly shared by marking them with the :shared attribute at declaration time.

        This is the normal and best way of sharing data.

      • Explicitly shared using threads::shared::share() or threads::shared::shared_clone() functions.

        This can be useful, but is over used.

      • Implicitly cloned by being made closures. That is, declared in one thread, and referenced within another thread subroutine.

        A necessary part of threading in a language that supports closures.

      • Implicitly cloned because they exist in the spawning thread prior to a 'child' thread being spawned.

        I have no idea why this happens. In my opinion it should not. The good news is that is is easy to avoid.

      The simple rule is declare your thread subs and spawn them before declaring and populating any data or code you don't need them to have access to.

    2. I am reading that sharing data between threads is slow. Is this true?

      Explicitly Shared data is relatively expensive to access -- less so for read-only than read-write.

      This is a necessary part of using threading with complex data types -- even Perl's scalar type is a complex data type capable of being a string, an integer, a floating point number, or a reference to anything. Perl has to protect its internals with locking.

      However, if you were using threads and shared data in any other language, you would incur much the same costs in order to ensure the internal integrity of your complex data types. The good news is that with Perl, this is taken care of for you.

    3. One of the main purposes for my application will be to share data between worker threads.

      Perl is perfectly adept at sharing data. And very good at doing so simply -- from the application programmer's perspective -- and safely, whilst preventing the whole class of mysterious, hard to track down bugs that arise through accidental sharing and/or user implemented locking (or lack thereof). But that simplicity and safety has its costs.

      Whether those costs are a barrier to your application depends very much on how much and what access patterns are required for that shared data.

    If you would describe your application, the shared data and usage patterns, we may be able to offer tips on the best way to program it.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      All lexical (my) variables are local to the thread they are declared in. Unless they are: Implicitly cloned by being made closures.

      Okay, here is a closure:

      { my $x = 20; sub do_stuff { print "$x \n"; } } do_stuff(); #$x has gone out of scope here --output:-- 20 #…yet the sub can still see $x

      But in the following thread can the sub see $x because it closes over $x, or can the sub see $x because:

      All lexical (my) variables are local to the thread they are declared in. Unless they are: Implicitly cloned because they exist in the spawning thread prior to a 'child' thread being spawned.

      use threads; use threads::shared; my $x = 20; sub do_stuff{ print "$x \n"; #closes over $x (as above) } threads->create(\&do_stuff)->join(); --output:-- 20

      If perl copies all the data to a thread, why doesn't the following code also output 20:

      use threads; use threads::shared; sub do_stuff{ print "$x \n"; #doesn't close over $x } my $x = 20; threads->create(\&do_stuff)->join(); #perl copies $x to the thread --output:-- <blank line> #but the thread can't see $x
        But in the following thread can the sub see $x because it closes over $x, or can the sub see $x because:

        It can see it, because the sub closes over it.

        But it would have been copied to the new thread anyway even if the sub didn't close over it -- because it existed when the thread was spawned -- but it isn't useful within the thread because if it isn't closed over, nothing can see it (nor therefore use it).

        Hence my comment "I have no idea why this happens. In my opinion it should not.".

        If perl copies all the data to a thread, why doesn't the following code also output 20:

        Enable strict or warnings and perl will tell you why.

        (And note: I didn't say "all the data"; I said "exist in the spawning thread prior to a 'child' thread being spawned." It is a subtle, but very important difference.)

        But, if you doubt my assertion that non-closed-over variables created after the thread sub is declared but before it is spawned are also cloned, run this and monitor the memory usage using the task manager or your OS equivalent:

        perl -Mthreads -wE"sub x{ sleep 100 }; my @x=1..1e7; sleep 10; async(\ +&x)->detach; sleep 100"

        What you'll see is something like this. The array is created and memory usage jumps to ~900MB and levels out for 10 seconds before the thread is spawned. It then jumps to ~1.9GB. Despite that the thread can never make any use of the copy that is made, because it is not lexical visible to it.

        It makes no sense whatsoever, but try getting anyone to change it.

        But, as I said above, the good news is that it is easy to avoid, by spawning your threads before you populate data structures used by your main thread code.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl thread confustion
by 7stud (Deacon) on Feb 15, 2013 at 06:40 UTC

    1. The most recent documentation for threads.pm states that variables are by default thread local? I have also read that everything gets copied over to a new thread. Which is it?

    Well, first everything is copied over to the new thread, then because everything is a copy, any changes to the copied variables don't effect the values of those variables in other threads, i.e. everything is thread local.

      any changes to the copied variables don't effect the values of those variables in other threads, i.e. everything is thread local.

      Unless the cloned variables are closed over or globals, they cannot be changed.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        any changes to the copied variables don't effect the values of those variables in other threads, i.e. everything is thread local.

        Unless the cloned variables are closed over ....

        You seem to be carving out an exception for something like this:

        use strict; use warnings; use 5.012; use threads; my $x = 20; sub do_stuff{ my $thread_id = shift; say "In thread $thread_id: ", ++$x; } threads->create(\&do_stuff, 1)->join(); say "In thread 'main': $x"; --output:-- In thread 1: 21 In thread 'main': 20

        But even though the thread closes over $x, it cannot change the $x in main. So, it appears to me that the closed over variable is also thread local.

Re: Perl thread confustion
by sundialsvc4 (Abbot) on Feb 15, 2013 at 13:21 UTC

    Data sharing is, as BrowserUK said, easy, and that’s most important in a good threaded design.   Your programs won’t bugger up their storage-pool and crash.   If you need to share data between threads, simply try to minimize the amount of code that actually contends for shared variables ... within sensible reason.   It’s often the case that threads communicate with one another by means of thread-safe queues ... work-to-do lists and work-completed lists.   This creates a simple way, not only to reduce contention, but to allow the various threads to work at their own naturally varying speeds.   If a particular set of shared variables is frequently and contentiously shared by everyone, they would represent a “hot spot” in any design regardless of language used ... they would tend to cause the threads to be synchronous with one another and to spend too much time waiting on locks, which is not what you want to see.   (Maybe the threads could instead include updated values in the messages they return to the work-completed queue.)   Obviously, design is a nest of competing trade-offs.