http://www.perlmonks.org?node_id=875326


in reply to Re: Looking for advice on how to tune stack size for threads
in thread Looking for advice on how to tune stack size for threads

Sorry, but this is another case of a little knowledge being a dangerous thing.

The main problem with starting threads with a default stack reserve of 16MB is not (just) the wastage of 15.996MB of virtual address space, (per thread), that will never be used for stack; and cannot then be used for anything else. As annoying and completely unnecessary as that is.

Far more insidious and consequential is the affect it has in the fragmentation of the virtual address space it leaves available to the rest of the program in terms of heap space.

You see, when a chunk of virtual address space is reserved, whilst it doesn't actually get allocated to the process or within the backing store (swap space), it does remove it from the pool of virtual address space that can subsequently be allocated to anything else. Eg. heap.

Now, whilst (say) 4 threads each reserving it's 16MB of VM doesn't sound like much of a problem, being only 3% of a typical 32-bit processes VM space. The problem arises when you look at where, within that 2GB VM address map, those 4x16MB chunks get allocated.

Because thread stacks get allocated at runtime, after many chunks of heap, (used by the code and data for modules loaded at start-up), have already been allocated, those 16MB chunks invariably have to be allocated somewhere in the middle of the virtual address map. And the effect of that is to fragment the total pool of allocatable virtual address space, in a way that means it can severely restrict the size of any subsequent single allocations.

In English, that means it severely limits the size of the biggest array or hash you can create. Because even though you have plenty of unused VM to accommodate the elements of that array, perl needs to be able to allocate a single, contiguous chunk of memory for the AV component of the structure. And because the 3 or 4 chunks of unused & un-reusable stack space are spread throughout the memory map, there is no single chunk big enough to hold it.

However, if those thread stacks only reserve as much VM as they are actually likely to use, then they will rarely ever get allocated in the middle of the VM address map, because it is far easier to find an unallocated space in low memory to accommodate a 1 or 2 page allocation than it is to accommodate a 4096 page allocation.

To demonstrate the difference this makes, below are the (simplified) VM memory maps of the same perl script that creates 4 threads.

The first uses the default thread stack allocation of 16MB. The memory maps are shown side by side before and after the threads are spawned:

0x00010000 - 0x00010000 - 0x00110000 threads.dll 0x00110000 threads,dll 0x00400000 perl.exe 0x00400000 perl.exe 0x0140b000 thread 0x1674 stack area 0x0140b000 thread 0x1674 stack +area 0x015d0000 default process heap 0x015d0000 default process heap 0x04f8e000 thread 0x05d4 stack +area 0x0636e000 thread 0x0dc8 stack +area 0x0774e000 thread 0x1bcc stack +area 0x08b2e000 thread 0x1198 stack +area 234MB contiguous free space 116MB contiguous free space 0x10000000 guard32.exe 0x10000000 guard32.exe ... ...

Notice how the allocation of the (required) 16kb of stack space has effectively halved the contiguous free space. Meaning that the largest array or hash that can be allocated has also been halved.

Now the same program except it uses a 4k thread stack allocation:

0x00010000 0x00010000 0x00120000 threads.dll 0x00120000 threads.dll 0x0017e000 thread 0x1dc2 stack +area 0x0034e000 thread 0x1f70 stack +area 0x0039e000 thread 0x0f90 stack + area 0x003ee000 thread 0x15a4 stack +area 0x00400000 perl.exe 0x00400000 perl.exe 0x0140b000 thread 0x1a4c stack area 0x0140b000 thread 0x1a4c stack +area 0x01530000 default process heap 0x01530000 default process heap 234MB contiguous free space 234MB contiguous free space 0x10000000 guard32.exe 0x10000000 guard32.exe ... ...

Notice how, because the stack reservations requested are so much smaller, the VM allocator has managed to tuck the 4 stack areas away into otherwise unused areas of low memory, leaving the contiguous free space completely untouched. Meaning that the maximum size of large data structures that the program can deal with, before having to resort to costly disk-based solutions, remains unaffected.

You see, notional knowledge, read somewhere and regurgitated at random intervals, is no substitute for actually understanding the details of what goes on under the covers. Just as notional wisdom based on aphorisms like: 'optimisation is the root of all evil', are no substitute for understanding that the omission of the word 'premature'; or the equally common misunderstanding that 'premature' does not mean 'any'; are the wrong kind of laziness.

There's an old saying applicable here as in many fields: "take care of the pennies, and the pounds take care of themselves". Throwing big numbers at memory allocations "because it's only virtual memory", and/or "because memory is cheap" simply doesn't cut it when the costs of the transition from memory-based storage to disk-based storage continues to be so high. Even in these days of relatively cheap SSDs, the multipliers involved have only dropped from 3 to 2 orders of magnitude. And there are no signs that is going to improve any time soon.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.