We have slow queries log enabled and Icinga monitoring the server.
The precipitating issue is that we have a table that has too much data for InnoDB so it is MyISAM, which is real slow in a multiuser environment due to table locking
Peak server usage is fairly consistent from day to day, but the problem comes and goes without an explanation. For now I moved some records to history so we have about 29k records instead of 38k records in the MyISAM table.
We have a couple of hosts making calls, but the majority of the hanging queries come from one application that makes queries to a big table.
This does not explain why we have queries against this one big MyISAM table that are apparently HUNG and run for hours. This does not make sense when all queries are finished and dbconnects always done. I think that when perl signals the disconnect, MySQL / ISAM does not always get the signal to close the connection. It looks like the connection is sitting there doing nothing while tying up resources. There was a PHP bug similar to this -- so I am wondering if Perl has the same bug??
We are playing with wait_timeout in development. So far 600 works without killing crons that take a while to run. Obviously this is the only possible fallback fix.
Now that InnoDB is launched and working, we are hoping that the MySQL team will burp the InnoDB max data size up to 384k (or the same as MyISAM whatever that is). InnoDB should work for any existing MyISAM table. Unless there is some reason above my pay scale, it does not make sense that row locking would require a smaller data size. Table locking is a method (in my book) and is not a particularly wise method in the majority of systems which are mult-user.
|