|No such thing as a small change|
Multiprocessing on Windowsby JohnRS (Beadle)
|on May 24, 2012 at 00:26 UTC||Need Help??|
JohnRS has asked for the
wisdom of the Perl Monks concerning the following question:
Hello monks. I seek your wisdom.
I have observed something odd regarding multiprocessing performance on Windows. When I run the test below, it seems that there is a * hugh * amount of process switching overhead. When I run the same test on a Linux server it runs as expected (almost no overhead). Here are the results.
Running a single child process establishes a baseline, 1.0x speed at 0% overhead. With Linux, running 5 processes, I see a 4.9x speed improvement with less than 1% overhead. Very good. But with Windows, running 5 processes, I see only a 2.5x speed improvement with about 91% overhead! In other words, the speed improvement was only about half of what it should have been and the CPU time almost doubled. What was the CPU doing this extra 91% of the time?
I realize that the test results aren't very accurate (about 10%). I ran them on live, but mostly idle, machines. The deviations in the Windows results are much more than 10%, however, so I think that they are relevant. Here is the test code.
The processes run compute bound and keep all 8 CPU's (when using 8 child processes) at 100% simultaneously, both on Windows and Linux. There is no I/O (except one print at the end), no blocking, no locking, and no shared memory. The processes last long enough that the setup time shouldn't be very important. Thus I'm left thinking that the overhead would be due to process switching by the operating system.
This test uses ithreads. I also ran a similar test using forks and the results in both cases, Linux and Windows, were almost identical to the itread results.
I realize that if the processes were normally blocked this wouldn't be as big an issue. But my job is compute bound. So event loops (POE, Coro, etc) wouldn't help. Not even POE's "Wheel", which uses fork, from what I read.
In summary, my questions are: 1) Is my test valid? 2) Is my conclusion valid? 3) Is there a way to get better multiprocessing performance on Windows?