Want the ability to create more concurrent threads in Perl?
For most purposes, the current limit of 120 concurrent is sufficient, but for some applications where individual threads can lay essentially dormant for extended periods, it has always seemed to be an arbitrarily low limit.
It turns out that the culprit is a single line in the Win32 makefiles, namely
$(LINK32) -subsystem:console -out:$@ -stack:0x1000000 $(LINK_FLAGS
+) \
This, in conjunction with the use of 0 for the stacksize parameter on the CreateThread call, means that each thread created reserves a whopping 16MB of virtual stack space. Although this reservation will rarely if ever get actually allocated, those reserve allocations add up and eventually prevent another thread being spawned because 120 * 16 MB = 1.875 GB which puts you within spitting distance of the 2 GB per process virtual memory limit. Combined with other memory reservations and allocations made by perl.exe itself mean that you cannot now spawn another thread until something goes away to reduce the processes total memory reservation.
There are two immediate ways around this:
- If you build your own perl, then reducing the value in the makefile to (say 0x0100000. More on that later.), will allow you to create well over 1000 threads.
In extremis, I've succeeded in reducing this value to the point where I've had over 3000 concurrent active threads running in just over 1 GB of ram, but they were not doing much at all.
- You can use the MS VC++ compiler tool, editbin /stack:0x00100000 \yourperl\bin\perl.exe to achieve the same effect on binary distributions.
For my purposes, following the lead of AS's wperl.exe, I made a copy of perl.exe called tperl.exe and applied the modifications to that. I then made an association between a .plt suffix and tperl.exe, in a similar way as I have between .plw and wperl.exe. Now I can use .pl and perl.exe for normal apps (thereby reducing any risk associated with the change), .plt for heavily threaded apps and .plw for gui apps. I guess a .pltw might be on the cards also.
Ramifications of the change
Reducing the stack reservation may sound like a dangerous practice, but it is only a reservation.
In use, the system seems to happily expand the stack for any individual thread well beyond this limit provided virtual memory is available to accommodate it. The value specified only comes into play if other parts of the process consume virtual memory (stack or heap) to the point where they would reduce the 2 GB below the reservation.
By specifying a large reservation, you are guaranteeing that should your thread need to expand it's stack to the reserved size, it will be able to do so. However, this comes at the cost of preventing other parts of the process from increasing their use of virtual memory--including heap--just in case your thread needs that space.
So by reducing the stack reservation, you run the risk that if other parts of your process have expanded their use of VM to the point where your thread can no longer expand it's stack, your process will terminate with a stack overflow or similar. However, if the other parts of your process require that much VM, and you had retained the larger stack reservation, then the process would have been terminated 'Out of memory' anyway.
So far as I can tell, and there seems to be little real documentation on the subject that I can find, there is little risk associated with the reduced reservation.
It's also worth pointing out that in my attempts to persuade perl to consume stack, and as confirmed by a man who knows, one of Perl's design features is that it does not make a great deal of use of the C-stack for most of it's operational needs.
In my limited testing, you generally have to be doing something pretty extreme to force Perl to consume anything more that very modest amounts of stack. In most cases, it only happens if you have runaway recursion (at the C-level), that would consume all available space until it crashed anyway.
The exceptions are:
- Complex, backtracking regex on very large strings, which should probably be replaced with better regexes anyway.
- Sorting very large datasets, though I found it hard to create the situation where I didn't run out of heap well before I ran out of stack. Maybe if you used the older quicksort algorithm instead of the default heap sort this would be more of a problem, but there doesn't seem to be any good reason for doing so.
- Recursive XS or Inline C code. Even then, if you are doing anything useful, as opposed to recursing for it own sake as with something like a C implementation of Ackerman's function, then you're more likely to run out of heap for your data before you run out of stack to process it.
If you use binary builds and don't have access to editbin.exe
The value in the executable that needs to be binary edited is in a well known and easily located place and is fairly trivial to change. Autrijus' Win32::Exe module should be easily tweaked to add this value to it's repertoire of modifiable values. I'll come up with a patch if there is any demand for it, and if Autrijus doesn't beat me to it.
Other OSs
renodino has done some testing with home built versions of Perl for Linux and has achieved similar kinds of increases in the number of simultaneous threads that can be achieved. The downside is that he has been unable to find a binary edit utility for the Linux platform. He's also done some testing on that platform on apps usng DBI and TK and has seen no detrimental effects from the change. I'll leave him to describe what testing he has done and other Linux related information if he chooses/anyone is interested.
A better solution
In the long term, a better solution would be for threads::create() to accept an extra (named?) parameter that allowed the Perl programmer to specify the stack reservation on a per thread basis. That would allow the choice of what size is applicable to made on a thread by thread basis and remove the (slight) possibility that lowering it for the Perl executable could cause large, non-threaded apps to have problems. renodino has some ideas on this, and maybe the p5p guys will consider the option if their combined wisdom doesn't find too many holes in the idea.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Use more threads.
by hv (Prior) on Feb 27, 2006 at 11:27 UTC
|
[renodino] has been unable to find a binary edit utility for the Linux platform.
For setting stacksize, you need the API function setrlimit(2); the manpage refers you also to the bash builtin 'ulimit' and quotactl(1).
Trying that locally against an example from What perl operations will consume C stack space?:
zen% ulimit -s
8192
zen% perl -wle '$n=shift; $_="a" x $n; /(ab*)+/' 10080
Segmentation fault (core dumped)
zen% ulimit -s 32768
zen% perl -wle '$n=shift; $_="a" x $n; /(ab*)+/' 10080
zen% perl -wle '$n=shift; $_="a" x $n; /(ab*)+/' 32766
zen% perl -wle '$n=shift; $_="a" x $n; /(ab*)+/' 32767
Complex regular subexpression recursion limit (32766) exceeded at -e l
+ine 1.
zen%
.. which gets me to the builtin limit.
Note that the limit may be capped by root, and that more complex systems may use the quota-based accounting method; but any barriers are there to stop people increasing stack size, so they shouldn't cause a problem for your requirements here.
HTH,
Hugo | [reply] [d/l] |
Re: Use more threads.
by renodino (Curate) on Feb 27, 2006 at 16:05 UTC
|
renodino has done some testing with home built versions of Perl for Linux...
Clarification: I didn't build a new Perl on Linux. Using the
stock Perl 5.8.6 in FC4, I ran some tests. Linux died at 289
threads. I also ran tests on Solaris 10 (which dies at ~1900
threads, and starts thrashing the swapper around 1300 threads),
and OS X 10.3.9, which dies around 450 threads.
Perhaps as importantly, I found a
link that sheds a bit more light on the subject.
My current approach (which I hope to build/test today) is to
add a couple new APIs to threads: set_stack_size() and get_stack_size(). The added code is pretty simple,
though it may not be applicable to the root thread
(the various editbin/setrlimit/ulimit solutions
may address that issue).
Its important to point out that this issue isn't just about using more threads (tho thats my personal requirement); given the huge default stack size on Win32 and Linux, one of the biggest complaints of threaded Perl apps - its voracious memory appetite - may be addressed by just trimming the stack reserve down to a reasonable/minimal number.
Update:
After adding the set/get_stack_size() methods and applying the associated
changes to the CreateThread()/pthread_attr_setstacksize() calls,
and then calling set_stack_size(65536),
I can crank out 1200 threads on Win32 (tho theres
definitely some swapping kicking in at around 900 threads).
Likewise, on Linux FC4, I can get 1000 threads on a fairly
small machine (an old 1GHz laptop w/512 meg), tho it
starts thrashing at around 1000 threads. And the vsz report
from ps shows a vast reduction in memory usage.
(Since I can't get more than 120 threads using the original
threads on Win32, I can't really make a useful memory usage
comparison)
Note that in both cases, I'm using the stock perl 5.8.6
wo/ any ulimit'ing or editbin'ing.
I'm going to try it on OS X and Solaris and see what shakes out.
FWIW: my method for doing this was to copy the threads and
threads::shared source directories into their own, and
rename everything to "morethreads" package root. The module
tests don't seem to pass w/ flying colors, but it may be
related to using the unofficial threads::shared 0.95
against perl 5.8.6.
Update 2:
After testing on OS X 10.3.9 and Solaris 10, they both
seem a bit less sensitive to the stack setting. Both reported
ulimit -s == 8192 (ie, 8Meg).
When I ran a comparison test on OS X between stock threads, and
my hacked morethreads, the overal performance was about the
same, tho ps -o vsz reported about half as much memory being
used when I set_stack_size(65536). So I'm assuming something
in either the perl build, or the OS is throttling the per-thread
stack size.
On Solaris, the test showed an even closer vsz between stock
and hacked threads. Stock was always about 15-20 megs higher than
hacked, so I'm assuming theres a build or OS limit there as well.
Following up on my Linux tests, ulimit -s reported 10240.
The vsz differences were dramatic: at 200 threads, the stock
version reported nearly 2Gig, while the hacked version reported
around 125Meg.
| [reply] |
|
The ulimit information is barking up the wrong tree. The posix thread stack size routines are the right way to go. ulimit will limit the maximum size of the stack, not the initial reserve. While limiting the maximum size will place a hard upper limit on the memory footprint, it will do nothing to reduce the lower limit. To do that, you must reduce the stack reserve (as the original post says). You can set the initial reserve at link time with the ld option --stack, which defaults to 2MB in the GNU binutils.
To modify this in the binary on a *nix box, you can "relink" it:
$ ld --stack 0x1000 perl -o tperl
$ nm -s perl | grep stack_reserve
00200000 A __size_of_stack_reserve__
$ nm -s tperl | grep stack_reserve
00001000 A __size_of_stack_reserve__
The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
| [reply] [d/l] [select] |
Re: Use more threads.
by zentara (Cardinal) on Feb 27, 2006 at 12:43 UTC
|
I saw a post on comp.lang.perl.misc asking why there is a new threads-shared module available separately for perl 5.8. The poster said that he made some improvements, but it was too big to put into the main perl5.8.8 release. That is weird isn't it? It says 'bless' is now supported on shared refs. Thats a bit beyond me, but you might find it interesting.
I'm not really a human, but I play one on earth.
flash japh
| [reply] |
|
I saw blessing support in new threads::shared ? yesterday.
The poster said that he made some improvements, but it was too big to put into the main perl5.8.8 release. That is weird isn't it?
My reading of that was that the changes were extensive and finished too close to the release of 5.8.8, so that there was not enough time to ensure adaquate testing before release. Releasing it to cpan means that those us interested in playing with it get to do so now without imposing the associated risks (if any) on all users of 5.8.8.
This way, dave_the_m potentially inlists a bunch of testers to check the changes out before it gets considered as a candidate for the next release. I think its a great idea. My only wish is that threads was available separately packaged also.
It'd be nice if things like the defined-or keyword could be made available in a similar manner. It seems to have been an inordinately long time since I first heard that was mooted for inclusion and it's still not available :(
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
Indeed, I must say that I am much more interested in the //= operator than in threads. Defined-or is useful even for small, single-threaded applications, such as the ones I write every day. Sure, it's mostly syntax sugar (mostly), but it's very *nice* syntax sugar (and, yeah, there are also those few instances where you really don't want to evaluate the left side twice, but that's more of a special case even than threads).
| [reply] |
|
Re: Use more threads.
by password (Beadle) on Apr 28, 2013 at 03:42 UTC
|
> editbin /stack:0x00100000 \yourperl\bin\perl.exe
Thank you so much for this!!! I just did it on Perl-5.12.3 and it allowed my script to use more of the available memory in the system. I couldn't pass a limit of around 80 threads, although on Perl-5.16.3 I could run over 120 threads with no problem. So I figured the versions must have been compiled with different limits (both ActivePerl), and so I found this post.
Unfortunately for now I must stick with 5.12.3, because 5.16.3 can't play with DBI connects in threads very well, producing tons of warnings about redefined subroutines. But with your "hack" I can still use 5.12.3 for a while. | [reply] |
|
Note also the stack_size use line option added to threads as a result of this thread.
I quite routinely set this as low as 4096 without problems, and doing so I have had a proof of concept code with 3000 threads running in 4GB of memory.
Not that there are many good uses for using more than 3 or 4 times as many threads as you have cores. So unless you are running your code on a system with 32 cores or more, seeking to run so many threads is usually a sign of naive coding.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
I agree completely with what you said about using many threads (or maybe using any threads at all) but what I've written is a proxy checker (with GUI), so most of the time my threads are waiting for a response or a timeout.
My next idea is to see if I can create 100 threads any faster, so I'll tighten it up as much as it is possible, and try that stack_size option. Although my coding is a bit naive and far from professional, my only fear is that I'm inventing the wheel sometimes. he-he.
| [reply] |
|
|
|
|
|