http://www.perlmonks.org?node_id=942092


in reply to Using TheSchwartz - 2 threads pick the same jobs. Any help ?

Sounds like your job queue is not thread safe. Are you just pushing and popping from an array?

  • Comment on Re: Using TheSchwartz - 2 threads pick the same jobs. Any help ?

Replies are listed 'Best First'.
Re^2: Using TheSchwartz - 2 threads pick the same jobs. Any help ?
by BrowserUk (Patriarch) on Dec 06, 2011 at 21:08 UTC

    "thread safety" & "arrays" don't come into it.

    The "queue" in this module appears to be a database table. And the "threads" are probably forked processes.

    I say "probably" because after 20 minutes of source diving, I'm still not sure. What I can say is that I saw no sign of threading.

    I can also say that this is the single most horrendously complex, over-engineered, stupefyingly over-architected module I've yet encountered. That doesn't mean it doesn't work, or that it might not work very well.

    Just that I wouldn't want to be the one responsible for deciding that it has been adequately tested. Or trying to track down bugs.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Well how 'bout locking/unlocking them tables to prevent this from happening ?

      About TheSchwartz being over-*, well.. I actually saw a lot of people using it, and lots of jobs on jobs.perl.org featuring TheSchwartz as a required skill .. so I'd imagine we're missing something from the picture, and that it probably is a good module.

      Equally, one could easily write a distributed job queue, using Redis, MongoDB, zeromq, rabbitmq and many others.

        Locking and unlocking tables is a pretty bad way to prevent race conditions compared to just using transactions. Even a pretend "database" like MySQL can support transactions these days. And even in MySQL without transactions, you can still prevent race conditions by making the assignment of a job a single UPDATE statement:

        UPDATE joblist SET pid = ? WHERE jobid = ? AND pid IS NULL

        (Just as an example and not based on having even glanced at TheSchwartz.) Including "pid IS NULL" in the WHERE clause of the UPDATE is what makes this type of assignment step "atomic".

        - tye        

        Well how 'bout locking/unlocking those tables to prevent this from happening ?

        Dunno. Maybe that'd work. But if a module's users have to even consider adding such things, then it doesn't bode well for ...

        ... that it probably is a good module.

        A pretty basic requirement of a "reliable job queue", is that once you take a job out of the queue, nobody else will be able to.

        Even ignoring the complexity problems of the implementation, I think that using an RDBMS as the basis of a distributed queue is fraught with problems architecturally speaking. RDBMSs are designed to be servers in a client-server world; supreme masters of the data they control; responsible only for ensuring the total coherency of that data at all times.

        Whilst the big boys -- Oracle, IBM, MS et al -- have add-ons for running their RDBMSs on clusters, they only achieve reliability by throwing multiply redundant high-availability hardware at the problem. Running one of the lesser-mortal free RDBMSs on commodity hardware is never going to achieve reliability.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?