Think about Loose Coupling | |
PerlMonks |
Things you need to know before programming Perl ithreadsby liz (Monsignor) |
on Aug 31, 2003 at 13:21 UTC ( [id://288022]=perlmeditation: print w/replies, xml ) | Need Help?? |
Recently I've received a lot more ithreads related questions, so I figured some background information might be in order. Some of these issues were already addressed about a year ago in Status and usefulness of ithreads in 5.8.0, but I think a recap is in order.
This is not a tutorial about the how to use threads. It's more a tutorial about how to use threads in a good way once you figured out they may hold a solution to your particular need. First of all, if you want to do anything for production use with Perl ithreads, you should get Perl 5.8.1 (or until then, one of the recent maintenance snapshots). There were several bugs in 5.8.0, one of which was a serious memory eating bug when using shift() on a shared array, which are now fixed in 5.8.1. However, there are still a number of caveats that you should be aware of when you want to use Perl ithreads. It's better to realize these limitations beforehand before you start to put a lot of work only to find in the end you don't have a machine big enough or fast enough to run your code in a production environment. So what are these caveats? Basically it boils down to one statement.
Perl ithreads are not lightweight!Unlike most other threads implementations that exist in the world, including the older perl 5.005 threads implementation, variables are by default not shared between Perl ithreads. So what does that mean? It means that every time you start a thread all data structures are copied to the new thread. And when I say all, I mean all. This e.g. includes package stashes, global variables, lexicals in scope. Everything! An example: which prints this on my system: thread: coderef = SCALAR(0x1eefb4) main: coderef = SCALAR(0x107c90)This shows that the lexical scalar $foo was copied to the thread. Inside the thread the "same" lexical now lives at another address and can be changed at will inside the thread without affecting the lexical in the main program. But this copying takes place when a thread is started! Not, what you might expect, at the moment the value of the lexical inside the thread has changed (which is usually referred to as COW, or Copy On Write). So, even if you never use $foo inside the thread, it is copied taking up both CPU and memory. But it gets worse: the same applies to all other forms of data. One of them being code references (as shown in this example): which prints on my system: thread: coderef = CODE(0x1deae4) main: coderef = CODE(0x107c9c)The code references are different! So, did it copy the whole subroutine? I've been led to understand that the actual opcodes of subroutines are not copied (but I've been hesitant to check in the Perl source code to actually conform this, so I'll have to take the p5pers word for it). But all the data around it, in this case the code reference in the package stash, is copied. Even if we never call foo() inside the thread!
Shared variables?
Implications On casual observation, you might think that would do the trick. But alas, this prints: Benchmark has been loaded!even though you've used the code inside the subroutine with which the thread is started! That's because use is executed at compile time. And at compile time, Perl doesn't know anything about threads yet. Of course, there is a run-time equivalent to use. This example indicates indeed that the Benchmark module has been loaded inside the thread only: which prints: Benchmark has not been loaded!Since I don't particularly like the require module: module->import idiom, I actually created the Thread::Use module that allows you to use the useit module; idiom. However, the compile time issue of use also works the other way around. Observe this example: which prints: Benchmark has been loaded!Again, this is caused by use being executed at compile time, before the thread is started at execution time (even though it is listed later in the code). So even putting the use statements after starting your threads, is not going to help. More drastic measures are needed. If you do not want to have all the copying of data, you need to start your threads before modules are loaded. That is possible, thanks to BEGIN {}. Observe this example: which prints: Benchmark has not been loaded! Scalars leaked: 1Yikes! What is that! "Scalars leaked: 1". Well, yes, that's one of the remaining problems/features/bugs of the Perl ithreads implementation. This particularly seems to happen when you start threads at compile time. From practical experience, I must say it seems to be pretty harmless. And compared to all of the other "leaking" of memory that happen because data-structures are copied, a single leaked scalar is presumably not a lot. And the error message is probably in error in this case anyway.
Tools for ithreads
fork? Not being hindered by the reasons for not using fork(), I developed a threads drop-in replacement called forks. Initially started as a pet project to see whether it would work at all, it became a bit more serious than that. The forks.pm has the distinct advantage of being able to quickly start a thread. But that's just because it does a fork(), which in modern *nixes is very fast. The communication and blocking and shared variables are handled by a TCP connection between the threads, in which the process holding the shared variable values is the server, and all the other threads (including the "main" thread) are clients. What you win in a quickly starting thread, you lose in delays with communication. So if you're not passing around a lot of data between threads, forks.pm might be for you. And additionally, forks.pm has the advantage of not needing a thread-enabled Perl. In fact, it even runs on Perl 5.6.0!
The future? Liz
Back to
Meditations
|
|