Aye, I had suspected this was so, however the presence of that SLEEP does render this implementation incomparable to the production case since this case has (this) very-serious problem on its own. And as to why the deadlocks are happening, reading just the material that you have posted here, I truly do not know why they would be. Is this a CGI process that is inserting data into a table that is also being used for some very heavy-duty long-running transactions by another back-end (non-CGI) process, say? An attempt to insert data into a table should not “fail.” Most certainly, it should not take any time at all to do either. If the table is contentious, then a CGI process probably should not be touching it, and the presence of many CGI processes doing so will make the contention considerably worse. (Perhaps this is what your competitor is right-now doing wrong?)
What if, for instance, you designated another daemon-process to which the CGI processes could hand-off requests, say using the existing (SOAP, XML-RPC, FastCGI pick one ...) protocols? Instead of attempting to do the work themselves, the CGI processes would would send the request to this server process (or pool), and await a reply. This server would execute requests very quickly but now each request would not contend with the others. (Production-ready servers for all of these are right now available on CPAN, so there is no burden of implementation of the plumbing.)
I believe, or at least by now I suspect, that it is this contention, whatever its true source may be, which is causing the deadlocks, is the root cause of this matter and that some design change may be required to permanently and effectively deal with it.