Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

threads::shared variables not really shared... or are they?

by Anonymous Monk
on Mar 06, 2008 at 23:04 UTC ( #672622=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I read that threads::shared variables are not really shared between threads; but rather, each thread has its own copy of the data, and the variables are tied to some "black magic" which propagates changes between threads. If that's true, then I'm very confused by my empirical results. My results seem to indicate the data is actually shared (i.e. not duplicated).

This simple program stores 100 1MB strings in an array shared between 10 threads. If the data is duplicated between threads, that should come out to 1GB total. Instead, the program reports a VmSize of only 214MB:
use threads; use threads::shared; my @a : shared; sub foo { sleep 5; print @a . " elements in \@a\n"; } for (1..10) { threads->new(\&foo); } my $s = " " x 1e6; for (1..100) { push @a, $s; } system "grep VmSize /proc/$$/status";
So it looks like that 100MB of shared data is really shared! There seems to be an 11MB fixed overhead per thread, which is probably why it reported 214MB instead of 100MB.

I also tried adjusting the size of the array and the strings in the array, tried using ints instead of strings, changed the number of threads, and populated the array collaboratively from within the threads. None of that made any difference -- the VmSize was always perfectly consistent with truly shared data (and 11MB fixed overhead per thread.)

Has Perl recently made its threads::shared data truly shared, or am I doing something wrong here? Thanks for any insight!

Highly confused,

Damon Hastings

Comment on threads::shared variables not really shared... or are they?
Download Code
Re: threads::shared variables not really shared... or are they?
by Joost (Canon) on Mar 06, 2008 at 23:49 UTC
    The answer (probably) is that the data is indeed shared, and the black magic part is where the synchronization between threads is handled. But there are people here who know much more about the perl threading system than me.

    On my preferred system (linux) perl threads are just system threads (pthreads). That means that anything "below" the perl language view is shared by default, and the only thing keeping non-shared variables separate is the perl interpreter. That also means that it's much easier to have shared XS objects than shared perl objects (provided you handle your semaphores correctly), and conversely, it's pretty damn hard to handle non-shared XS objects correctly.

Re: threads::shared variables not really shared... or are they?
by renodino (Curate) on Mar 06, 2008 at 23:51 UTC
    Shared scalars, arrays, and hashes are indeed shared, but via tied proxy mechanism. The actual data resides in a shared Perl interpretter context; the usual tie() operations then route all reads/writes (and various metadata operations) to execute against the shared interpretter's version of the variables. Note that, for scalars, the shared value does get copied into the private proxy on read().

    I can't say precisely why your example doesn't grow significantly, except that the array tie() operations won't populate the private proxy copy in each thread, instead routing directly to the shared interpretter's version of the array. So in your example, the private @a never grows, only the shared version.

    Note that if you were vivifying individual shared scalars, you likely would see some significant memory growth, since there would be a copy in both the originating thread, and in the shared interpretter.

    Also note that the shared interpretter is one of the major bottlenecks in ithreads: as you might imagine, a Perl interpretter context holds a lot of state, thus concurrent access requires a major lock on the whole thing...which unfortunately can create a lot of thread contention.


    Perl Contrarian & SQL fanboy
Re: threads::shared variables not really shared... or are they?
by zentara (Archbishop) on Mar 07, 2008 at 15:02 UTC
    I'm not an expert on thread memory fundamentals, but I did find something interesting. If you add the 1M scalars to the array from within the threads, IT WILL gain the memory you seek. I had to reduce the number from 100 to 50, because at 100, it was killed by the kernel.
    #!/usr/bin/perl use threads; use threads::shared; my @a : shared; for (1..10) { sleep 1; threads->new(\&foo); } sleep 1; system "grep VmSize /proc/$$/status"; <>; sub foo { my $s = " " x 1e6; for (1..50) { push @a, $s; } print @a . " elements in \@a\n"; }

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
      The original added 100 1Mbyte strings to @a. Your example adds 50 * 10 = 500 1Mbyte strings to @a. Also, each thread will create a 1Mbyte private scalar; the original only created a single big string.

      (The lack of locks is a bit troubling, but the aforementioned global interpretter lock is probably keeping your example out of trouble)


      Perl Contrarian & SQL fanboy
        Ooops, you are right, this one gives memory use that agrees with the OP's original script. I think the sleep calls prevent the locking problem. At least on a faster machine. :-)
        #!/usr/bin/perl use threads; use threads::shared; my @a : shared; for (1..10) { sleep 1; threads->new(\&foo); } sleep 1; system "grep VmSize /proc/$$/status"; <>; sub foo { for (1..10) { push @a, " " x 1e6; } print @a . " elements in \@a\n"; }

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://672622]
Approved by Joost
Front-paged by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2014-10-02 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (69 votes), past polls