Your "use threads" should do the trick, or any of the alternatives described in the "THREAD STACK SIZE" section of the threads
POD...just do it as early as possible. I'd suggest that 64 * 4096 is probably a lot more than you need, as Perl has its own notion of a runtime stack, and in reality doesn't use that much process/thread stack (unless you're using very large/complex regexes, and even that issue goes away in 5.10). Note that once stack size is set, its set for all the threads/modules, so you don't need to do anything in Thread::Pool.
Also keep in mind that each thread has its own Perl interpretter, so memory size may be an unavoidable issue, esp if the threads are spawned from another thread that already has a lot of context (due to the interpretter cloning).
I just realized your issue has 2 parts: memory and CPU; note that my comments thus far have been entirely about the memory issue. However, given the size of the Perl process, might the CPU issue be related to paging/swapping ?
Perl Contrarian & SQL fanboy