Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

How do I handle a DB error when a SQL statement must not fail?

by ted.byers (Monk)
on Mar 28, 2014 at 22:25 UTC ( #1080164=perlquestion: print w/replies, xml ) Need Help??
ted.byers has asked for the wisdom of the Perl Monks concerning the following question:

Here is some sample code, showing how I normally handle database errors.

use strict; use DBI; use Exception::Class::TryCatch; my $db = 'test'; my $hostname = 'localhost'; my $user = 'myuser'; my $dbpwd = 'xxxxxxxxxxxxx'; my $dbh = DBI->connect_cached("DBI:mysql:database=$db;host=$hostname", +$user,$dbpwd,{RaiseError => 1}) or die "Failed to connect to the DB.\ +n"; ##$dbh->{AutoCommit} = 0; # enable transactions, if possible $dbh->{RaiseError} = 1; try eval { $dbh->begin_work or die $dbh->errstr; my $sql = "BEGIN;"; # $dbh->do($sql); $sql = "UPDATE foo SET y='aa' WHERE x=1;"; $dbh->do($sql); $sql = "SELECT SLEEP(10);"; $dbh->do($sql); $sql = "UPDATE foo SET y='bb' WHERE x=2;"; $dbh->do($sql); $dbh->commit; }; if ( catch my $err) { print $err->error,"\n"; eval { $dbh->rollback }; print "Exiting from catch block\n"; exit(0); } $dbh->disconnect; print "DONE!\n";

Obviously this is a contrived example, showing showing that Exception::Class::TryCatch catches deadlock errors. NB: I found that I can't use begin_work if autocommit is already set to 0, in which case I send 'BEGIN' to the database, but I don't need to use the latter, if I have autocommit = 1 and use begin_work: both variants produce the same output. There is a second script that is very similar, and when the two are executed at the same time, at least within a second or two, the one executed second dies with a deadlock error.

My production case is a little different, in that autocommit is on (which as you know turned each statement into its own transaction). I still need to figure out a way to verify that tests the ability of this code to detect any deadlock that may arise from my production code. The production code involved two individual INSERT statements, each executed using $dvh->do($sql). These INSERT statements must not fail, so I need help to figure out how to retry them (and keep retrying them) until they succeed (or log them to a file, after, say, 10 retries, if they continue to fail). Hence, one question is, how do I modify the above code so that it better tests the possibility of deadlock errors involving a signle INSERT SQL statement with autocommit on, and then, how to modify it so that it retries the inserts until they either succeed or we have to give up because they failed too many times (along with the error that explains why they failed)?

The context is that these inserts are invoked from a CGI script AFTER the data has been validated. I know the SQL I am using is generally valid, as 99.99% of the instances in which they're executed, the data is stored properly. Alas, the 0.01% or less of the transactions that fail leave no record of their existence in the MySQL logs, the webserver logs, or logs I create using perl. This is a problem as none of the data involved can be lost,

At this point, I have had no luck figuring out why there are random intervals of 5 to 10 minutes during which data is lost, with an average of 1 such period per week (most weeks have one such period, some have 2 and some have 0). It was suggested to me a few days ago that maybe I am encountering a deadlock with two instances of my cgi script trying to insert data at the same time. Alas, I don't yet know how to test whether or not this is true, and if so, how to improve my error handling so that the insert that fails is retried.. Nor do I know what else may be awry that can hit so very rarely.

I would welcome any aide you can provide.



Replies are listed 'Best First'.
Re: How do I handle a DB error when a SQL statement must not fail?
by Your Mother (Chancellor) on Mar 29, 2014 at 01:56 UTC

    Just curious because of the language you used. Is the data validated in the client side and reported as success there? Validation has to happen on the server. Client side stuff serves other purposes: saving pointless server hits with pre-validation, helpful UX alerting the user early in the case of invalid data, etc. JS validation is trivial to circumvent for a hacker.

    Alas, the 0.01% or less of the transactions that fail leave no record of their existence in the MySQL logs, the webserver logs, or logs I create using perl.

    Either you are doing all of the so called transaction in these fail cases in JS or your logging is incomplete/broken/off somewhere in the chain. By default (in most webservers) there is a record of all requests/responses whether errors or not. No record means no request was ever made/received and the failure point is pre-CGI.

      The answer, on the question of data validation on client or server is both. I do use client side validation, using JavaScript, mostly to notify the user of a data entry error before he or she proceeds to submit it. But, more importantly, I check all the data server-side. I regard failing to validate server side as plain stupid and, especially, insecure.

      All transactions are received by my cgi script, written in perl, and submitted by it to the database (but this is after 1: data validation server side, and 2: submission of some of it to one of the web services we use, and the some of the data stored is the response from the web service used, again validated on my server before attempting to store it). By implication, then, from what I am doing and what you say, the logging is either broken or incomplete. So, that, I guess, is the next thing on my list of things to examine, and that is, how to make the web server logging more complete, and the database logging also. I know the transaction came through my server, and thus my validation code, because I learned of the missing data by looking at the data stored by the web services we're using. The failure point is either the web service not sending us the transaction results, or between that event and the attempt to store the data. I think I can see how I can check that.



Re: How do I handle a DB error when a SQL statement must not fail?
by Anonymous Monk on Mar 29, 2014 at 00:21 UTC

    It was suggested to me a few days ago that maybe I am encountering a deadlock with two instances of my cgi script trying to insert data at the same time.

    No, this likely is not the problem (impossible), you can have 100 of them simultaneous, as long as you can make 100 connections to msyql (mysql configuration stuff), and have enough ram, each and every CGI process should update stuff in the database

    Alas, the 0.01% or less of the transactions that fail leave no record of their existence in the MySQL logs, the webserver logs, or logs I create using perl. This is a problem as none of the data involved can be lost,

    This suggests network slowdown, where the CGI isn't being reached, very very slowly

    Or it suggests a program on the webserver machine (or proxy infront of webserver) is running wild and out of control, eating all the CPU , going to swap , tying up the harddisks and ram and cpu all at once, so nothing else gets a chance to run

    Or both things at the same time

    The solution to let the web-user know it failed unless you're 100% sure it didn't

    on my laptop this happens with windowsupdate sometimes, it always likes to eat cpu (fan kicks in) and read lots of harddisk , and sometimes it goes into infinite loop, then starts swapping memory to disk ... cannot do anyting on laptop, can't even kill windowsupdate, windowsupdate eats all hardware, and I have to reboot

    so check your ulimits/proc limits... whatever you're using

    if it happens around same time, check your cron jobs and the like (windows update, scheduled tasks ... whatever your os)

    so if there is a time-frame/time-window when this happens, pick the hour, then flip on maximum process monitoring/logging and give that procwatcher the maximum priority you can ... and watch to see what happens :)

    The way to handle this on the browser side is post-redirect-get ... if the post times out, the user submitting the data will get a timeout

    The other way to handle it is post-redirect-get through AJAX ... make sure to notify the user when post-redirect-get fails

    It should fail everytime the data isn't inserted in the database

    This way there is no chance data goes missing

      Thanks for this.

      It is reassuring that it is more likely an issue of the configuration of host we're using (as we can ask for more resources of any kind), rather than the code (except my code doesn't handle running out of resources well). And of course, we'll try to use resource monitoring, to see if we can identify what process is going crazy and consuming all available resources of whatever kind.

      We know that it can not be a scheduled task that is causing the problem because the periods during which data is presently being lost are random: no particular time of day or day of the week.

      The process is a user submits data to my cgi script, which validates it all, and if the validation check passes, some of it is sent to one or more of the web services we use, and the results we receive back from these is validated and combined with the request data we originally received to be stored. Since I can check the data on the services we're using, and it is checking that data that tells me about the missing data, and when it happens, that made me aware of the problem. We therefore know that I am receiving the data, successfully validating it, and sending it to the services we're using (which means the web service in question properly processed it). Thus, the failure is either that the web service fails to get it's response back to me, or at the point at which we store the data.

      Clearly I need to beef up the web logs, so that the web server actually stores the complete request that we receive, and in the case of the request we submit to the web services we're using, the complete request we submit and the complete response that we get, if there is one. What I normally see in these logs is the first few dozen bytes of data sent, or received, and a message that so many hundred additional bytes were sent/received. I do not yet know how to tell Apache to do that, so that is more documentation I need to study (unless you know and can tell me - I have not yet even figured out how to tell Apache to start new logs every day at midnight, keeping the logs with the current date embedded in the logs' file names - but I will figure that out sooner or later, unless someone just tells me how to do it).

      The critical data determining success or failure from the perspective of the user is whether or not the web services we're using successfully processed the data we sent to them. We can not tell the user that the transaction failed if the web service in question succeeded in processing the transaction, regardless of whether or not there is a subsequent failure, either in them communicating the result back to us or in our attempt to store the data. This failure must be handled entirely on my server, and in a way that preserves all data we received from the user and any data received from the web service(s).



Re: How do I handle a DB error when a SQL statement must not fail?
by sundialsvc4 (Abbot) on Mar 29, 2014 at 17:48 UTC

    I suspect that the root cause of the problem is right here:

    $sql = "SELECT SLEEP(10);";

    This is asking the SQL service engine to sleep.   This is probably stoppering things up.   Furthermore, it’s holding a transaction open for ten seconds, and this I wager is the root cause of your deadlocks.

    If you need to take a snooze, your application program should do this on its own.   It should not keep a transaction open during the duration.


      Actually, the code I showed was a test script designed to test the ability of Exception::Class::TryCatch to catch deadlocks (I have a whole suite of similar tests that verify it's ability to catch more mundane SQL errors, focussed on the construction of the SQL statement itself). It works with a nearly identical script that accesses the same table, but does different updates on the table. The only purpose of the SLEEP(10) is to give me a chance of executing the second test script before the first completes. I can guarantee that there is no SLEEP(10) statement anywhere in my production code. The applicability of the example script I showed relates too the fact I use a try/catch block for the transactions in question. The only difference in the try block is that I have two SQL statements, and autocommit is on (so no need for begin/commit), The second aspect of its applicability is how I ought to restructure that try/catch logic so that, if the first attempt to insert fails, it can be retried either a limited number of times before logging the fact and relevant details, or it succeeds. Remember, this is part of a CGI program, and one of the constraints I impose on my code is that it must respond within 5 to 10 seconds, from the moment the client machine submits the request, to the time it receives my response (my competitor typically take 20 seconds to respond).

      Recall, I had asked how to refactor from a simple try/catch logic to a try/catch and retry logic, where retries may have a limit of 5 or 10 before logging the failure). I also don't know, at this stage, whether or not DBI issues some other error if mysql just takes to long to respond, but without mysql generating an exception, or if DBI can produce other exceptions unrelated to the correctness of the SQL submitted. How do I identify the error conditions that DBI may produce that are unrelated to the correctness of the SQL that has been submitted, and what is the best way to respond (server side only) in such a way as to ensure data is never lost?

      Thanks again


        Aye, I had suspected this was so, however the presence of that SLEEP does render this implementation incomparable to the production case since this case has (this) very-serious problem on its own.   And as to why the deadlocks are happening, reading just the material that you have posted here, I truly do not know why they would be.   Is this a CGI process that is inserting data into a table that is also being used for some very heavy-duty long-running transactions by another back-end (non-CGI) process, say?   An attempt to insert data into a table should not “fail.”   Most certainly, it should not take any time at all to do either.   If the table is contentious, then a CGI process probably should not be touching it, and the presence of many CGI processes doing so will make the contention considerably worse.   (Perhaps this is what your competitor is right-now doing wrong?)

        What if, for instance, you designated another daemon-process to which the CGI processes could hand-off requests, say using the existing (SOAP, XML-RPC, FastCGI pick one ...) protocols?   Instead of attempting to do the work themselves, the CGI processes would would send the request to this server process (or pool), and await a reply.   This server would execute requests very quickly but now each request would not contend with the others.   (Production-ready servers for all of these are right now available on CPAN, so there is no burden of implementation of the plumbing.)

        I believe, or at least by now I suspect, that it is this contention, whatever its true source may be, which is causing the deadlocks, is the root cause of this matter and that some design change may be required to permanently and effectively deal with it.

Re: How do I handle a DB error when a SQL statement must not fail?
by sundialsvc4 (Abbot) on Mar 31, 2014 at 13:46 UTC

    I’m sure that there will very-shortly be some recommends from other people about what sort of service you might use ... but ... given what you have just said about the activity for this table in your most-recent post, great big bells are going off in my head right now to the effect that I can see no plausible reason why you should be encountering deadlocks in this situation, other than that there is an as-yet undiscovered bug in your program.   I would take a very close look at your SQL server's logs and/or administrative console.   There is something wrong with this picture.

    If you are doing simple inserts of reasonable volumes of data, each one enclosed in a transaction with a reasonable isolation-level, and there is no other competing activity except at 1 AM, then I simply cannot fathom why you would ever be encountering deadlocks.   I was skeptical about this earlier, thinking it could be something else in a very-unusual situation, but now I’m sayin’ that line from Men In Black:   “Zed, we got a bug!”   Therefore, my advice to you now shifts back to considering what the bug might be – and I hope that other Monks will now chime-in on that, as I expect they will.

    And, okay, you’ve really piqued my curiosity now.   What is it that your competitor takes twenty(!!) seconds (seconds, not microseconds) to do with a database?

      That, too, was my gut reaction, but the MySQL gurus I talked with suggested it may be a deadlock, so I started to test for that, as, not being a true expert with MySQL, I had no concrete evidence that it wasn't a deadlock.

      But, neither can I fathom what kind of bug can be in my code that would allow it to work 99.99% of the time, and give apparently random failures.

      I wonder if it is a question of resource exhaustion, as suggested here by Anonymous Monk on Mar 29, 2014 at 00:21 UTC. That is something I am presently investigating. I finally found a resource monitor that will log resource use to a file for CPU cycles, memory, disk IO and network traffic. And, I have beefed up the logging done by MySQL. Hopefully, between those logs, and whatever you and other monks suggest is worth examination regarding whatever kind of bug I may have, I will get to the bottom of this RSN.

      I am still interested in learning about these servers you mentioned (particularly FastCGI and XML::RPC, and how they can serialize insertions into a DB with reqeuests coming from multiple CGI processes, along with how much traffic will make such infrastructure mandatory

      As for what my competition is doing, I have no idea. I watch industry reports, particularly those that report average response times for services like mine, response time being measured as the time the client software makes a request to the time it gets a response time back. Being competitors, I doubt they'll let me look at their code or the architecture they use. ;-) Back when I developed desktop applications in Java and C++ (I preferred C++ because it was, and still is, blindingly fast relative to Java, or any other language I can use), I found in some cases, it was just a case of using a compiled language rather than an interpreted language, and in a great many more cases, it was a question of using the wrong algorithm to complete the task (e.g. using a linear search through an enormous dataset, rather than the more sensible binary search). Now, if I can learn how to use C++ to handle requests using FastCGI, and could have a good library to add FastCGI support to my C++ projects, then I may well resort to that, at least if and when the traffic grows to a point where Perl could have trouble handling it (or use profiling information to find out where the bottlenecks are, and see if a package written in C++ can improve things). In what ways my competitors make their systems so slow, I can make educated guesses, but can not really know because I can't see their code.

      BTW, perhaps you might had paraphrased Men in Black, rather than quoting it, to say, "Ted, We have a bug." ;-)



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1080164]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
[karlgoethebier]: Ouch! How can i link to a section in my profile?

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2017-06-24 13:19 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (557 votes). Check out past polls.