Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^3: Using TheSchwartz - 2 threads pick the same jobs. Any help ?

by spx2 (Deacon)
on Dec 07, 2011 at 13:08 UTC ( [id://942241]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Using TheSchwartz - 2 threads pick the same jobs. Any help ?
in thread Using TheSchwartz - 2 threads pick the same jobs. Any help ?

Well how 'bout locking/unlocking them tables to prevent this from happening ?

About TheSchwartz being over-*, well.. I actually saw a lot of people using it, and lots of jobs on jobs.perl.org featuring TheSchwartz as a required skill .. so I'd imagine we're missing something from the picture, and that it probably is a good module.

Equally, one could easily write a distributed job queue, using Redis, MongoDB, zeromq, rabbitmq and many others.

Replies are listed 'Best First'.
Re^4: Using TheSchwartz - 2 threads pick the same jobs. Any help ? (locks--)
by tye (Sage) on Dec 07, 2011 at 14:41 UTC

    Locking and unlocking tables is a pretty bad way to prevent race conditions compared to just using transactions. Even a pretend "database" like MySQL can support transactions these days. And even in MySQL without transactions, you can still prevent race conditions by making the assignment of a job a single UPDATE statement:

    UPDATE joblist SET pid = ? WHERE jobid = ? AND pid IS NULL

    (Just as an example and not based on having even glanced at TheSchwartz.) Including "pid IS NULL" in the WHERE clause of the UPDATE is what makes this type of assignment step "atomic".

    - tye        

      Seeing as my experience with SQL is completely amateurish (I've only used SQL for things like the CB stats database, never for a commercial-grade system), I want to see if I understand your "pid IS NULL" == "atomic" bit, as that likely would not have occurred to me, largely due to that lack of experience, so this is mostly an effort to internalise it by stating what is likely obvious to more experienced people.

      If we simply had UPDATE joblist SET pid = ? WHERE jobid = ?, the theory is that since the determination of the jobid could be happening on two (or more) threads at the same time, and it is NOT part of the same query, the db could return the same jobid to more than one thread, and then they both try to update the db with their pid. In this case, the last one wins, but all earlier updates thought they were successful, so those threads would not know that their jobid was stolen from them.

      With the pid IS NULL bit, the first thread to claim the jobid still gets it, but all other threads will have this update statement fail. Thus, it is imperative in this system that one checks the return value from the UPDATE (we'd expect "1" if it succeeded in updating what should be a single row, assuming jobid's are unique, which seems like a reasonable assumption here, and "0" if the row was not updated). If the return shows failure, we need to loop back and find a new satisfactory jobid, and try again.

      If I'm understanding this correctly, I may need to go back to my own SQL code to see if this is needed in the CB stats or some such :-) Thanks, tye. And thanks marto for mentioning it in the CB causing me to go looking at it.

        Yes, that is mostly correct.

        The UPDATE won't "fail", however, it will "succeed" in updating 0 rows. Which means the DBI execute() will return a "true zero" like "00" or "0e0", not "0".

        Note that you don't need to do this if you are using transactions properly, but it can still be a good idea because it can also catch some common mistakes when using transactions.

        $db->begin_work(); my $job= $db->select_scalar( 'SELECT MIN(jobid) FROM joblist WHERE pid IS NULL FOR UPDATE', ); $db->do( 'UPDATE joblist SET pid = ? WHERE jobid = ?', {}, $$, $job, ); $db->commit();

        The above is also an example of an atomic update. By starting a transaction, if two processes try to update the same record, then one of them will become blocked (at the point where they express a desire to update) until the other finishes their transaction (or one of transactions will fail).

        Note that I specified "FOR UPDATE" on the SELECT statement. A decent database doesn't let readers block writers and doesn't let writers block readers. The original proposal of a "table lock" is, IME, more error prone and also causes readers to block writers and writers to block readers, which can introduce significant performance limitations, especially since it locks the whole table not just the rows being updated. (Locking the whole table can be required when INSERTs matter but I design database schemas where such INSERTs are gated by one of many controlling records in another table so INSERTs can still be done in parallel if they involve different accounts, for example.) (Updated)

        By saying "FOR UPDATE", I express that the data I am getting back from this SELECT is going to influence an update that I am about to make. This causes the record(s) involved in the SELECT to be marked as being part of a pending transaction involving updates. So this causes (at least on some DB implementations) a second "FOR UPDATE" SELECT on the same row(s) to block until prior pending transactions are finished.

        The Postgresql documentation is quite good on a lot of the subtleties of transactions and locking but I don't know how much of this is common to other databases. There are other models for how to do transactions that are more strict on ensuring consistencies but can have a larger impact on performance. And it may be that there are modern databases where the queries and updates don't block each other but the inconsistency is caught at commit time (one commit fails).

        The default model with Postgresql is one that allows for a great deal of parallel execution while providing protections against inconsistencies and races that are reasonable and only require a moderate amount of care.

        So, when using Postgresql with the default isolation level, you need to either specify "FOR UPDATE" in the above SELECT or "pid IS NULL" in the above UPDATE to prevent a race. Doing both is what I would recommend (and then starting over if the UPDATE updates 0 rows or if the transaction fails).

        (Update:) Note that if your update is impacted by more than one record, then you need to be careful to specify and follow a specific order when obtaining the record locks so that you don't risk deadlock or having your transaction canceled by deadlock prevention.

        - tye        

Re^4: Using TheSchwartz - 2 threads pick the same jobs. Any help ?
by BrowserUk (Patriarch) on Dec 07, 2011 at 22:07 UTC
    Well how 'bout locking/unlocking those tables to prevent this from happening ?

    Dunno. Maybe that'd work. But if a module's users have to even consider adding such things, then it doesn't bode well for ...

    ... that it probably is a good module.

    A pretty basic requirement of a "reliable job queue", is that once you take a job out of the queue, nobody else will be able to.

    Even ignoring the complexity problems of the implementation, I think that using an RDBMS as the basis of a distributed queue is fraught with problems architecturally speaking. RDBMSs are designed to be servers in a client-server world; supreme masters of the data they control; responsible only for ensuring the total coherency of that data at all times.

    Whilst the big boys -- Oracle, IBM, MS et al -- have add-ons for running their RDBMSs on clusters, they only achieve reliability by throwing multiply redundant high-availability hardware at the problem. Running one of the lesser-mortal free RDBMSs on commodity hardware is never going to achieve reliability.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://942241]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-03-29 00:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found