Thanks, but I think you and I are doing completely different comparisons (correct me if I'm wrong).
My intention here is to compare a very simple CGI script with precisely the same thing running with PSGI as a CGI ie. not in any sort of persistent backend. My (admittedly limited) understanding is that when you run plackup -l 127.6.6.6:80 dumpenv.psgi 2>2 1>1 you are creating a persistent plack server which is going to service the requests as opposed to what I am hoping to measure which is an on-demand set up and tear down single run of a script.
Your post talks about ThreadsPerChild which is only a concern if there are concurrent requests. Again, I'm not bothered about that for the purposes of this test. I want to know about the single script run in isolation.
If a persistent backend is used (whichever one is chosen) that ought to be much faster after instantiation than any standalone CGI - that's not in doubt. However, the great big selling point of the Plack middleware is AIUI that the developer can seamlessly switch between various backends including none at all and not have to worry about the protocol intricacies. I'm challenging this selling point when it comes to the none-at-all scenario where we want to run the application as non-persistent CGIs. In that situation my findings are that the use of the middleware imposes what can be a significant performance penalty.