Your "use threads" should do the trick, or any of the alternatives described in the "THREAD STACK SIZE" section of the threads POD...just do it as early as possible. I'd suggest that 64 * 4096 is probably a lot more than you need, as Perl has its own notion of a runtime stack, and in reality doesn't use that much process/thread stack (unless you're using very large/complex regexes, and even that issue goes away in 5.10). Note that once stack size is set, its set for all the threads/modules, so you don't need to do anything in Thread::Pool.
Also keep in mind that each thread has its own Perl interpretter, so memory size may be an unavoidable issue, esp if the threads are spawned from another thread that already has a lot of context (due to the interpretter cloning).
I just realized your issue has 2 parts: memory and CPU; note that my comments thus far have been entirely about the memory issue. However, given the size of the Perl process, might the CPU issue be related to paging/swapping ?
Re your update:
Thats interesting, you might be right. This box has 2GB ram, so I didn't think I was in danger of running out... Unless WinServer2k3 decided to page/swap even when there is memory available... How would I check this?
The reason I think you're right is that my kernel times tend to be rather high when running this script.
Here's an example.