in reply to Writing Solid CPAN Modules
Unfortunately, almost everything you mention regarding thread-safety is of little use to a perl module writer.
- The wikipedia entry for thread safe, discusses it in terms that would be useful to a C programmer, but that are, for the most part, non-useful to a Perl programmer.
This is almost irrelevant to a Perl programmer.
In C, reentrancy can (sometimes) be achieved by only using "atomic opreations" on data. You can usually saftely increment and decrement a shared integer without applying locking because these operations are single opcodes and so the entire operation is started and completed on a given thread before the scheduler gets a chance to interupt.
All Perl's datastructures are "fat". Which is to say, each Perl variable, even the humble scalar contain not just the user's value, but also a large amount of "system state". Eg. An SV (can) consist of an IV, an RV, a PV and a CV plus multiple forms of magic.
All of these various pieces that sit behind a Perl scalar (mostly hidden from the Perl programmer and are, for the most part, beyond his control), must be maintained in a coherent state. However the Perl level operations that manipulate and maintain this state on behalf of the user consist of 10s, 100s or even 1000s of machine-level opcodes, and the scheduler can interupt a thread in between any two of these.
Further more, many of the C-runtime calls used by Perl in the actioning of Perl-level operations are themselves non-reentrant. These are entirely beyond the ability of the Perl programmer to affect or control.
As such, the entire concept of reentrancy is completely lost on a Perl programmer.
- Mutual exclusion.
Whilst the threads::shared module does provide access to a number of OS level functions cond_*. These are for the most part pretty useless to the Perl programmer.
The underlying APIs upon which they are based is intended for use from C-level code where the cond_var and lock_var parameters are themselves C-style unsigned integer variables--Ie. pieces of data which can be manipulated using cpu-atomic operations. The Perl view of these APIs substitutes Perl scalars--which are very much not cpu-atomic storage--which makes the use of these APIs fraught with problems.
- Exception safety.
Since Perl 5.8.0, perl has had "safe signals"--ie. signals are caught by the Perl runtime and deferred from interupting the user-level code until that code reaches the end of the current Perl-level operation.
Whilst this works pretty well for single threaded applications--although the, possibly indefinite, delay between the signal being sent and the Perl-level code noticing it, somewhat detracts from the utility of the whole concept of signals--their use in multi-threaded Perl code is very confused.
- What happens if multiple threads install a signal handler for the same type of signal?
- When the signal comes, which thread gets control?
- How should signals be propagated between threads?
- How are multiple signals handled?
That is, if the runtime detects one signal (type) and then receives a second, disparate signal prior to the user code having reached the end of the current Perl-level operation, which one of these signals gets propagated?
- perlthrtut (Link currently broken (again) from where I am, but maybe it'll be fixed again soon!)
Whilst there is nothing (that I've noticed) in this document that is factually inaccurate, much of the content is of little use to a Perl programmer who is either trying to use threads in a stand-alone script, and less so if he is trying to code a Perl module such that it can safely be used from within threaded Perl scripts.
Mosty of the content is only useful to that very small band of Perl authors trying to write modules that would useable across iThread boundaries!
That is to say, modules that
- use threads within themselves without the user needing to be aware that threads are being used.
I know of no existing modules publically available that fit into this category.
- modules that are intended for use from multiple threads but that require those threads to communicate in order for the module to perform it's primary role.
A good (the best) example of this is Thread::Queue. These modules are rare, and thanks to the extremely clever design and implementation of iThreads, the need for threads that do this is even rarer.
So far in my experiments, due to the way iThreads has been implemented, the vast majority of existing modules, including those written prior to the existance of Threaded Perl, and that as far as I am aware, have never been modified to make them "thread-safe", are perfectly usable and useful with iThreads--provided they are used in the correct way!
This is where the existing documentation is almost completely lacking! There is almost nothing that provides advice on how threads can be safetly used with existing modules.
As such, perlthrtut acts almost as anti-documentation. It is so full of apparent caveats, hidden traps, warnings, history and discussion of intricate details that are, for the most part, completely irrelevant to the average Perl programmer wishing to use threads, that it serves only to put people off.
It is almost totally lacking of any discussion of:
- Why people might consider using iThreads.
- What benefits they might get from doing so.
- When iThreads are likely to be beneficial (and when not).
- How to structure programs to use iThreads effectively.
- Comparative discussion of iThreads with alternatives and when which is most applicable.
- The bug you found Perl threads sort test program crashes.
I'm not sure what the official resolution of the bug you found was--nor actually if it was ever resolved as you didn't provided a link to the perlbug #30333 and I've forgotten where to go to look them up. In any case, as far as I can remember, you have to have a logon at the relevant web site just in order to view the bug description or find out whether it has been resolved--which always seemed pretty silly to me.
However, I am prepared to go out in a limb and guess that if the problem has been resolved, the resolution required a patch to the Perl sources and was completely beyond your control as a Perl programmer? As such, I am not really sure of the efficacy of citing that bug in this context. It doesn't help any module author in creating a thread-safe module?
I should also consider the possibility that the resolution to your bug was a piece of "Don't do that!" advice.
If this is the case, then I see nothing in perlthrtut, nor threads pod, nor threads::shared pod that appears to related to the issue in your post, nor any post in that thread that identifies the underlying problem and solution?
- The suggestion that Perl should start to specify which functions and modules are thread-safe is flawed.
- The cited example, Tie::File, is perfectly "thread-safe"--if used in a thread-safe way.
The problem with the example code in that post is that it makes no attempt to use threads::shared. It doesn't share the tied array; makes no attempt to perform any locking. It could never do anything useful as coded!
In fairness to pg, Perl threads, and particularly iThreads were very new back then. I don;t think that even that little documentation that is available now was available back then; there was an almost complete absence of example code available. He went on to do some very useful things with threads.
However, your citing that post in this meditation is less forgiveable.
- And so is the suggestion that module authors should make this consideration when writing modules.
For the most part, module authors have no need to consider (i)threading when writing their modules, and to state that they should do so, and "certify" their modules as "thread-safe" will have two effects--both of them negative:
- Those authors that have neither the need nor the desire to use threads themselves will see no benefit to themselves in attempting to consider threading in the design and implementation of theor modules.
The likely effect will be that they will either ignore the directive, or more insidiously, dodge the issue completely and simply label it a "Not thread safe". For many (most) modules, this will be incorrect, as most modules can be used with threads, if used in the right way--or rather, not used in the wrong way.
- Those (potential CPAN) authors that have modules that work for them and consider releasing them to CPAN, will see the directive for certifying thread-safety, take a look at perlthrtut, get panicked by the caveats and dire warnings it contains and decide to withold their modules from release as they are put off by the need to try and meet this "requirement".
Which for the vast majority of modules would be completely unnecessary.
So, whilst I understand your motivations for trying to pursuade people to consider thread-safety when writing modules, the information and examples you cite in support of your intentions, are flawed to the point where I think that they will do more harm than good.
- The best way to promote threading in Perl would be to provide some documentation that gives some practical examples of using them as "patterns" for potential users. A necessary part of this would be to show some good examples of how existing (pre-threading) modules can be safetly used with threads provided a few simple rules of thumb are followed.
- To promote the writing of new modules that can be used in conjunction with threads, then an addition of a section to one or more of the module tutorials that outlines those few things that author should not do if they want their modules to be usable in conjunction with threads, would be a big step forward.
- To promote the writing of modules that (transparently) use threads to enhance their utility--asynchronous reading/writing of files; providing user feedback during long running/cpu intensive operations; etc. This would require (probably) a completely new document aimed specifically at the authors of this (rare) type of module.
Modules of this type probably belong in a namespace that identifies them as requiring threads for their operation. Unfortunately, the mixed up and confusing conflagration of the original Thread::* namespace (which currently contains a raft of modules which were either written to work with the original and flawed pThreads implementation, or those written to bypass the threads module completely utilising liz's forks module; combined with the dubious choice of a lower-case, "pragma" namespace and the strange "rules" that surround these, means that chosing a suitable namespace for modules of this type is impossible.
If you place them in the Thread::* namespace, they will most likely be ignored as old, pthreads modules that are no longer useful.
If you place them in the threads::* namespace (where IMO they belong), the self-appointed namespace nazis will jump all over you for ignoring (their) rules.
This is a no-win situation that will only be resolved by:
- Discarding/renaming the irrelevant modules from the Thread::* namespace--but purging the pthreads association is likely impossible.
- Opening up the threads::* namspace for iThreads dependant modules--my prefered option--but unlikely to get past the "pragmas are for core modules only"
lobby or the "threads are spawn of the evil empire and the total antithisis of Perl unix roots" brigade.
- Create a new namespace (say Ithreads::*) for such modules--likelyhood of acceptance based on what I have read = 0.
- Forget the idea until CP6AN comes into being and hope that the underlying threading model, and the clean slate of the namespace conventions will afford a suitable place for thread-dependant modules to be placed.
Please don't take this as a personal attack--I applaud the intent of your meditation. Most of my critisms are not aimed at you, nor indeed, any individual or group. The main thrust of my critism is aimed at the situation in general, that has been arrived at through no particular plan or agenda. It's just where we happen to have ended up!
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.