Re^2: Looking for advice on how to tune stack size for threadsby BrowserUk (Pope)
|on Dec 04, 2010 at 03:37 UTC||Need Help??|
Sorry, but this is another case of a little knowledge being a dangerous thing.
The main problem with starting threads with a default stack reserve of 16MB is not (just) the wastage of 15.996MB of virtual address space, (per thread), that will never be used for stack; and cannot then be used for anything else. As annoying and completely unnecessary as that is.
Far more insidious and consequential is the affect it has in the fragmentation of the virtual address space it leaves available to the rest of the program in terms of heap space.
You see, when a chunk of virtual address space is reserved, whilst it doesn't actually get allocated to the process or within the backing store (swap space), it does remove it from the pool of virtual address space that can subsequently be allocated to anything else. Eg. heap.
Now, whilst (say) 4 threads each reserving it's 16MB of VM doesn't sound like much of a problem, being only 3% of a typical 32-bit processes VM space. The problem arises when you look at where, within that 2GB VM address map, those 4x16MB chunks get allocated.
Because thread stacks get allocated at runtime, after many chunks of heap, (used by the code and data for modules loaded at start-up), have already been allocated, those 16MB chunks invariably have to be allocated somewhere in the middle of the virtual address map. And the effect of that is to fragment the total pool of allocatable virtual address space, in a way that means it can severely restrict the size of any subsequent single allocations.
In English, that means it severely limits the size of the biggest array or hash you can create. Because even though you have plenty of unused VM to accommodate the elements of that array, perl needs to be able to allocate a single, contiguous chunk of memory for the AV component of the structure. And because the 3 or 4 chunks of unused & un-reusable stack space are spread throughout the memory map, there is no single chunk big enough to hold it.
However, if those thread stacks only reserve as much VM as they are actually likely to use, then they will rarely ever get allocated in the middle of the VM address map, because it is far easier to find an unallocated space in low memory to accommodate a 1 or 2 page allocation than it is to accommodate a 4096 page allocation.
To demonstrate the difference this makes, below are the (simplified) VM memory maps of the same perl script that creates 4 threads.
The first uses the default thread stack allocation of 16MB. The memory maps are shown side by side before and after the threads are spawned:
Notice how the allocation of the (required) 16kb of stack space has effectively halved the contiguous free space. Meaning that the largest array or hash that can be allocated has also been halved.
Now the same program except it uses a 4k thread stack allocation:
Notice how, because the stack reservations requested are so much smaller, the VM allocator has managed to tuck the 4 stack areas away into otherwise unused areas of low memory, leaving the contiguous free space completely untouched. Meaning that the maximum size of large data structures that the program can deal with, before having to resort to costly disk-based solutions, remains unaffected.
You see, notional knowledge, read somewhere and regurgitated at random intervals, is no substitute for actually understanding the details of what goes on under the covers. Just as notional wisdom based on aphorisms like: 'optimisation is the root of all evil', are no substitute for understanding that the omission of the word 'premature'; or the equally common misunderstanding that 'premature' does not mean 'any'; are the wrong kind of laziness.
There's an old saying applicable here as in many fields: "take care of the pennies, and the pounds take care of themselves". Throwing big numbers at memory allocations "because it's only virtual memory", and/or "because memory is cheap" simply doesn't cut it when the costs of the transition from memory-based storage to disk-based storage continues to be so high. Even in these days of relatively cheap SSDs, the multipliers involved have only dropped from 3 to 2 orders of magnitude. And there are no signs that is going to improve any time soon.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.