Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Is a called package in thread storage?

by Wiggins (Hermit)
on Jul 15, 2008 at 17:10 UTC ( #697762=perlquestion: print w/replies, xml ) Need Help??
Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

Another project, another quandary. This one takes me back to 1971, undergraduate CS. "Recursion - Re-entrant, Reusable, Refreshable".

thread 1; runs File::Monitor discovering and enqueueing filenames to
thread 2; Thread::Queue feeding
thread 3-6; my code to do MIME Email decomposition. All feeding off of the QUEUE.

I have put the MIME decomp code(developed earlier) into a simple non-OO package, simply for scoping and to modularize the files. If I leave this MIME code in that package; and call it from threads 3,4,5,6; are its local (my) variables in global storage or thread storage? If I cut/paste the MIME code into the processing subroutine running on those threads(3-6), it is definitely in thread-specific storage.

I probably need to pass the MIME decomp an array reference to push it's results into anyway.

I am that finding that looking for answers on the web for topics like this (Perl Threads) can be dangerous, since many go back to 5.003 or other implementations.

Any suggestions on books on current Perl threading would be much appreciated...


Replies are listed 'Best First'.
Re: Is a called package in thread storage?
by BrowserUk (Pope) on Jul 15, 2008 at 18:05 UTC
    are its local (my) variables in global storage or thread storage?

    My first reaction to this is: Why do you want to know?

    But since such meta-questions can be really annoying, I'll expand a little :)

    Pretty much everything user accessible in Perl is allocated from the process heap. That said, there is a complication in that the heap is broken up into buckets and some buckets can be reserved for use by particular threads.

    The important thing about about variables in threaded programs, is not where the storage is allocated. It is their visibility.

    A secondary consideration is whether they are cloned, but that's a complex issue that would require a lot of words to describe. And someone intimately familar with Perl's internals to both describe, and subsequently understand the description.

    Which brings us back to my initial reaction above. If you explain why you want (or think you need?) to know, then you're more likely to get a reply that will be useful to you.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      What got me started was following multiple threads calling the same MIME decomp code, and preempting each other. I flashed back to the the techniques to do recursion, and the use of the activation record to store all working variables going through the common code on each cycle through. In OO, a new object encapsulates all of the object specific working space (or pointers to heap space).

      Would my package allocate a hash during thread 1(T1), then be preempted by T2, which, finding the hash already there just start stuffing more into it.

      My training goes back to linear shared physical memory space for all processes; not this managed buckets of private pointers to allocated heap space.

      I feel better now... thanks ... I guess I am a hand-on sort of person.

        At the simplest level, all you need to know is that unless you explicitly share a variable, only the thread that declared it can access it. They are invisible to other threads.

        However, there is a caveat. That all variables instantiated before you spawn a thread, are cloned into that thread. Both the spawned thread and the spawning thread will have entirely independent copies of those variables that existed at the point of thread creation. The effect is very similar to the fork mechanism whereby the state of the process is cloned and each process (parent and child) have their own copies of everything that existed at the fork point.

        However, there is a caveat with that description also, in as much as process global resources, like the standard file handles, pre-existing sockets etc. are not entirely independent of each other. This has consequences that you need to be aware of. For example:

        use threads;; open O, '>', 'junk.dat';; ## Open a file print O $_ for 1 .. 100; ## print some stuff to it print tell O;; ## print the file pointer position 392 async{ ## spawn a thread print tell O; ## inherits a (cloned) copy of the file + handle 392 ## complete with pre-existing state print O $_ for 1 .. 100; ## print some more stuff from within th +e thread print tell O; ## And it's copy of the file handle ref +lects the change 784 };; print tell O;; ## but back in the main thread the orig +inal doesn't see it 392 print O $_ for 1 .. 100;; ## even after local modifications print tell O;; ## 784 close O; exit;; c:\test>wc -l junk.dat ## All the output ended up in the file, + 300 junk.dat ## But neither threads file handle ## reflects the true pointer position.

        That's probably the worst anomalous behaviour. It's 'safe' to use file handles from more than one thread, provided you use locking to prevent interspersion of output, but do not rely upon the output from tell.

        Another consequence of the cloning is that if you allocate large data structure before spawning threads, each thread spawned afterward will get a copy of that data structure regardless of whether it needs it or not.

        The trick here is to spawn early and require packages within the threads that need them rather than useing them. With care, you can control (to some degree) what packages are loaded into what threads.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is a called package in thread storage?
by pc88mxer (Vicar) on Jul 15, 2008 at 17:45 UTC
    I'll take a shot at this. I believe the only variables which are shared between threads are those which are explicitly shared using the share declaration from use threads::shared. From the threads::shared documentation:
    By default, variables are private to each thread, and each newly cre-
    ated thread gets a private copy of each existing variable.  This module
    allows you to share variables across different threads (and pseudoforks
    on Win32).  It is used together with the threads module.
Re: Is a called package in thread storage?
by renodino (Curate) on Jul 15, 2008 at 21:03 UTC
    (Note: The following applies to Perl threads only, not to system-level threads in general)

    As near as I can decipher from your post, the lexicals are thread private.

    Each Perl thread gets its own Perl interpretter. When a new thread is spawned, it gets a cloned version of the parent's Perl interpretter context (w/ some minor exceptions). Which means any lexicals in your package will simply get cloned, not shared, between the threads.

    If you need to share variables between threads, you must explicitly declare them to be shared. See threads::shared.

    Alas the only book I'm aware of that covers the subject is the Camel 3E, and its (a) quite dated and (b) not a very thorough discussion. There have been some articles on the subject, but I believe most of them are fairly dated at this time as well. I've not checked the latest editions of Learning or Mastering, so I don't know if they discuss ithreads.

    Perl Contrarian & SQL fanboy
      All of these have been great answers, to really understand what is really happening in the background! But it makes me pause and step back for a moment.

      I started in FORTRAN (without a number), then SNOBOl, ALGOL, PL/1, 'C', Pascal, C++, Java(ugh!), and now Perl (last 8 years). And the one constant to be found here is that each language attempted to "abstract away" some complexity from the language it is attempting to usurp. And in "dumbing down" or "abstracting" the understanding of the basic concepts, the programmer is required to understand less and less of the fundamental concepts.

      Just look at what the answer to my question has brought out. " use 'require' rather than 'use' in threads to control bloat"; that is something you would never find in a book, since it would require the reader to know too much about the underlying concepts, implementations and philosophies. And the implementation of Threads:shared hasn't even been touched. I really want a "thread shared Hash that is tied to a dbm', but I am settling for my own disk journaling system to recover from any "unplanned termination".

      And would many of the readers have the foundation in 'computer science' to understand memory management or the complexities of parameter passing or 'varargs' ( or how about 'varargs' on a risc processor, with no stack)? Yet, to solve the hard problems, the programmer must understand what lies beneath. Dive below the abstraction!

      I love Perl, and abstraction is great; but we should never forget how to program assembler in a linear address space!

      A real world example that may have lost $1,000,000

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://697762]
Approved by pc88mxer
[marto]: good morning all
[Discipulus]: good morning marto

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2018-01-19 10:27 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (217 votes). Check out past polls.