When using asynchronous IO, the user code has no control over what OSthread will be running when the OS callback to complete the IO.
Not with Kernel32 IO or epoll or select. Ive seen in 3rd party closed source libs, a thread pool is a bandaid when the code can't be audited and fixed of blocking syscalls and high CPU code (parsing, searching, etc). The correct fix is for all blocking syscalls become async IO. All waits on locks become async. High CPU code goes into a separate thread/core/process that reports progress events back to the event loop/dispatcher thread. So when you are lazy, just throw it in threads, if the threads block (and the pool goes empty and the pool event isn't picked up for execution with the timeout period), so what, automatically make more threads. I (a lazy devel) can now call my library "scalable" too on the powerpoint to the suits.
And if the 'wrong' OS thread -- ie. *not* the one that created the IO completion callback -- is running when the IO completion callback is called by the OS, then things go belly up (Ie. CRASH!), because none of the perl environment elements -- stash, coderefs, closures, et al. -- are available when the Perl coderef given as the callback, are available if it is called within the context of the wrong thread.
The *only* way to address that problem is to set the context for the current thread -- at the XS level -- prior to entering the Perl level callback code.
If that description does not seem correct and unassailable to you, it can only mean that you do not understand the problems inherent is responding to asynchronous callback at the Perl level. So read more and think twice.
I know very well how Perl threads relate to OS threads since I've already done all the mistakes you describe above in the past. It is not mandatory to move Perl threads between OS threads. The asyncs callbacks from random threads can be serialized (the callback thread blocks until the event is serviced by the 1 Perl thread) into 1 Perl thread where the Perl threads spends most of the time sleeping on its event queue (Tk, User32, Wx, whatever). The callbacks taking a ms or 2 longer than a pure C program but so what. The overhead is indistinguishable from using an interpreted language in the first place.
's problem is he doesn't know what C data structures in Perl can and can not be accessed between different OS threads and when leading to race condition nightmares.