Want the ability to create more concurrent threads in Perl?
For most purposes, the current limit of 120 concurrent is sufficient, but for some applications where individual threads can lay essentially dormant for extended periods, it has always seemed to be an arbitrarily low limit.
It turns out that the culprit is a single line in the Win32 makefiles, namely
$(LINK32) -subsystem:console -out:$@ -stack:0x1000000 $(LINK_FLAGS +) \
This, in conjunction with the use of 0 for the stacksize parameter on the CreateThread call, means that each thread created reserves a whopping 16MB of virtual stack space. Although this reservation will rarely if ever get actually allocated, those reserve allocations add up and eventually prevent another thread being spawned because 120 * 16 MB = 1.875 GB which puts you within spitting distance of the 2 GB per process virtual memory limit. Combined with other memory reservations and allocations made by perl.exe itself mean that you cannot now spawn another thread until something goes away to reduce the processes total memory reservation.
There are two immediate ways around this:
- If you build your own perl, then reducing the value in the makefile to (say 0x0100000. More on that later.), will allow you to create well over 1000 threads.
In extremis, I've succeeded in reducing this value to the point where I've had over 3000 concurrent active threads running in just over 1 GB of ram, but they were not doing much at all.
- You can use the MS VC++ compiler tool, editbin /stack:0x00100000 \yourperl\bin\perl.exe to achieve the same effect on binary distributions.
For my purposes, following the lead of AS's wperl.exe, I made a copy of perl.exe called tperl.exe and applied the modifications to that. I then made an association between a .plt suffix and tperl.exe, in a similar way as I have between .plw and wperl.exe. Now I can use .pl and perl.exe for normal apps (thereby reducing any risk associated with the change), .plt for heavily threaded apps and .plw for gui apps. I guess a .pltw might be on the cards also.
Ramifications of the change
Reducing the stack reservation may sound like a dangerous practice, but it is only a reservation.
In use, the system seems to happily expand the stack for any individual thread well beyond this limit provided virtual memory is available to accommodate it. The value specified only comes into play if other parts of the process consume virtual memory (stack or heap) to the point where they would reduce the 2 GB below the reservation.
By specifying a large reservation, you are guaranteeing that should your thread need to expand it's stack to the reserved size, it will be able to do so. However, this comes at the cost of preventing other parts of the process from increasing their use of virtual memory--including heap--just in case your thread needs that space.
So by reducing the stack reservation, you run the risk that if other parts of your process have expanded their use of VM to the point where your thread can no longer expand it's stack, your process will terminate with a stack overflow or similar. However, if the other parts of your process require that much VM, and you had retained the larger stack reservation, then the process would have been terminated 'Out of memory' anyway.
So far as I can tell, and there seems to be little real documentation on the subject that I can find, there is little risk associated with the reduced reservation.
It's also worth pointing out that in my attempts to persuade perl to consume stack, and as confirmed by a man who knows, one of Perl's design features is that it does not make a great deal of use of the C-stack for most of it's operational needs.
In my limited testing, you generally have to be doing something pretty extreme to force Perl to consume anything more that very modest amounts of stack. In most cases, it only happens if you have runaway recursion (at the C-level), that would consume all available space until it crashed anyway.
The exceptions are:
- Complex, backtracking regex on very large strings, which should probably be replaced with better regexes anyway.
- Sorting very large datasets, though I found it hard to create the situation where I didn't run out of heap well before I ran out of stack. Maybe if you used the older quicksort algorithm instead of the default heap sort this would be more of a problem, but there doesn't seem to be any good reason for doing so.
- Recursive XS or Inline C code. Even then, if you are doing anything useful, as opposed to recursing for it own sake as with something like a C implementation of Ackerman's function, then you're more likely to run out of heap for your data before you run out of stack to process it.
If you use binary builds and don't have access to editbin.exe
The value in the executable that needs to be binary edited is in a well known and easily located place and is fairly trivial to change. Autrijus' Win32::Exe module should be easily tweaked to add this value to it's repertoire of modifiable values. I'll come up with a patch if there is any demand for it, and if Autrijus doesn't beat me to it.
renodino has done some testing with home built versions of Perl for Linux and has achieved similar kinds of increases in the number of simultaneous threads that can be achieved. The downside is that he has been unable to find a binary edit utility for the Linux platform. He's also done some testing on that platform on apps usng DBI and TK and has seen no detrimental effects from the change. I'll leave him to describe what testing he has done and other Linux related information if he chooses/anyone is interested.
A better solution
In the long term, a better solution would be for threads::create() to accept an extra (named?) parameter that allowed the Perl programmer to specify the stack reservation on a per thread basis. That would allow the choice of what size is applicable to made on a thread by thread basis and remove the (slight) possibility that lowering it for the Perl executable could cause large, non-threaded apps to have problems. renodino has some ideas on this, and maybe the p5p guys will consider the option if their combined wisdom doesn't find too many holes in the idea.