http://www.perlmonks.org?node_id=1018836


in reply to Perl thread confustion

  1. The most recent documentation for threads.pm states that variables are by default thread local? I have also read that everything gets copied over to a new thread. Which is it?

    Both!

    All lexical (my) variables are local to the thread they are declared in. Unless they are:

    • Explicitly shared by marking them with the :shared attribute at declaration time.

      This is the normal and best way of sharing data.

    • Explicitly shared using threads::shared::share() or threads::shared::shared_clone() functions.

      This can be useful, but is over used.

    • Implicitly cloned by being made closures. That is, declared in one thread, and referenced within another thread subroutine.

      A necessary part of threading in a language that supports closures.

    • Implicitly cloned because they exist in the spawning thread prior to a 'child' thread being spawned.

      I have no idea why this happens. In my opinion it should not. The good news is that is is easy to avoid.

    The simple rule is declare your thread subs and spawn them before declaring and populating any data or code you don't need them to have access to.

  2. I am reading that sharing data between threads is slow. Is this true?

    Explicitly Shared data is relatively expensive to access -- less so for read-only than read-write.

    This is a necessary part of using threading with complex data types -- even Perl's scalar type is a complex data type capable of being a string, an integer, a floating point number, or a reference to anything. Perl has to protect its internals with locking.

    However, if you were using threads and shared data in any other language, you would incur much the same costs in order to ensure the internal integrity of your complex data types. The good news is that with Perl, this is taken care of for you.

  3. One of the main purposes for my application will be to share data between worker threads.

    Perl is perfectly adept at sharing data. And very good at doing so simply -- from the application programmer's perspective -- and safely, whilst preventing the whole class of mysterious, hard to track down bugs that arise through accidental sharing and/or user implemented locking (or lack thereof). But that simplicity and safety has its costs.

    Whether those costs are a barrier to your application depends very much on how much and what access patterns are required for that shared data.

If you would describe your application, the shared data and usage patterns, we may be able to offer tips on the best way to program it.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Perl thread confustion
by 7stud (Deacon) on Feb 15, 2013 at 07:00 UTC

    All lexical (my) variables are local to the thread they are declared in. Unless they are: Implicitly cloned by being made closures.

    Okay, here is a closure:

    { my $x = 20; sub do_stuff { print "$x \n"; } } do_stuff(); #$x has gone out of scope here --output:-- 20 #…yet the sub can still see $x

    But in the following thread can the sub see $x because it closes over $x, or can the sub see $x because:

    All lexical (my) variables are local to the thread they are declared in. Unless they are: Implicitly cloned because they exist in the spawning thread prior to a 'child' thread being spawned.

    use threads; use threads::shared; my $x = 20; sub do_stuff{ print "$x \n"; #closes over $x (as above) } threads->create(\&do_stuff)->join(); --output:-- 20

    If perl copies all the data to a thread, why doesn't the following code also output 20:

    use threads; use threads::shared; sub do_stuff{ print "$x \n"; #doesn't close over $x } my $x = 20; threads->create(\&do_stuff)->join(); #perl copies $x to the thread --output:-- <blank line> #but the thread can't see $x
      But in the following thread can the sub see $x because it closes over $x, or can the sub see $x because:

      It can see it, because the sub closes over it.

      But it would have been copied to the new thread anyway even if the sub didn't close over it -- because it existed when the thread was spawned -- but it isn't useful within the thread because if it isn't closed over, nothing can see it (nor therefore use it).

      Hence my comment "I have no idea why this happens. In my opinion it should not.".

      If perl copies all the data to a thread, why doesn't the following code also output 20:

      Enable strict or warnings and perl will tell you why.

      (And note: I didn't say "all the data"; I said "exist in the spawning thread prior to a 'child' thread being spawned." It is a subtle, but very important difference.)

      But, if you doubt my assertion that non-closed-over variables created after the thread sub is declared but before it is spawned are also cloned, run this and monitor the memory usage using the task manager or your OS equivalent:

      perl -Mthreads -wE"sub x{ sleep 100 }; my @x=1..1e7; sleep 10; async(\ +&x)->detach; sleep 100"

      What you'll see is something like this. The array is created and memory usage jumps to ~900MB and levels out for 10 seconds before the thread is spawned. It then jumps to ~1.9GB. Despite that the thread can never make any use of the copy that is made, because it is not lexical visible to it.

      It makes no sense whatsoever, but try getting anyone to change it.

      But, as I said above, the good news is that it is easy to avoid, by spawning your threads before you populate data structures used by your main thread code.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        - but it isn't useful within the thread because if it isn't closed over, nothing can see it (nor therefore use it). Hence my comment "I have no idea why this happens. In my opinion it should not.".

        I see. So data is copied, but it is not accessible. Seems like a great feature!