Re^6: Using TheSchwartz - 2 threads pick the same jobs. Any help ? (locks--)

in reply to Re^5: Using TheSchwartz - 2 threads pick the same jobs. Any help ? (locks--)
in thread Using TheSchwartz - 2 threads pick the same jobs. Any help ?

Yes, that is mostly correct.

The UPDATE won't "fail", however, it will "succeed" in updating 0 rows. Which means the DBI execute() will return a "true zero" like "00" or "0e0", not "0".

Note that you don't need to do this if you are using transactions properly, but it can still be a good idea because it can also catch some common mistakes when using transactions.

    $db->begin_work();
    my $job= $db->select_scalar(
        'SELECT MIN(jobid) FROM joblist WHERE pid IS NULL FOR UPDATE',
    );
    $db->do(
        'UPDATE joblist SET pid = ? WHERE jobid = ?',
        {}, $$, $job,
    );
    $db->commit();
[download]

The above is also an example of an atomic update. By starting a transaction, if two processes try to update the same record, then one of them will become blocked (at the point where they express a desire to update) until the other finishes their transaction (or one of transactions will fail).

Note that I specified "FOR UPDATE" on the SELECT statement. A decent database doesn't let readers block writers and doesn't let writers block readers. The original proposal of a "table lock" is, IME, more error prone and also causes readers to block writers and writers to block readers, which can introduce significant performance limitations, especially since it locks the whole table not just the rows being updated. (Locking the whole table can be required when INSERTs matter but I design database schemas where such INSERTs are gated by one of many controlling records in another table so INSERTs can still be done in parallel if they involve different accounts, for example.) (Updated)

By saying "FOR UPDATE", I express that the data I am getting back from this SELECT is going to influence an update that I am about to make. This causes the record(s) involved in the SELECT to be marked as being part of a pending transaction involving updates. So this causes (at least on some DB implementations) a second "FOR UPDATE" SELECT on the same row(s) to block until prior pending transactions are finished.

The Postgresql documentation is quite good on a lot of the subtleties of transactions and locking but I don't know how much of this is common to other databases. There are other models for how to do transactions that are more strict on ensuring consistencies but can have a larger impact on performance. And it may be that there are modern databases where the queries and updates don't block each other but the inconsistency is caught at commit time (one commit fails).

The default model with Postgresql is one that allows for a great deal of parallel execution while providing protections against inconsistencies and races that are reasonable and only require a moderate amount of care.

So, when using Postgresql with the default isolation level, you need to either specify "FOR UPDATE" in the above SELECT or "pid IS NULL" in the above UPDATE to prevent a race. Doing both is what I would recommend (and then starting over if the UPDATE updates 0 rows or if the transaction fails).

(Update:) Note that if your update is impacted by more than one record, then you need to be careful to specify and follow a specific order when obtaining the record locks so that you don't risk deadlock or having your transaction canceled by deadlock prevention.

- tye

In Section Seekers of Perl Wisdom