I'm going to talking at YAPC in a couple of weeks about techniques for website scalability. One of the things I'll be discussing is the use of job queuing to handle large numbers of requests for a limited resource. My main example will be what Ticketmaster does on their site. However, the Ticketmaster code for this is tightly coupled to their backend systems and is thus not open source. I want to give people some open source examples to look at, so I'm wondering what others have been using.
Ideally I am looking for something with these characteristics:
- works across a cluster of machines
- relatively easy to administer
- provides control over how many jobs are running at once
- allows asynchronous calls, i.e. a client sends in a request and then can check on the status and eventually the results later on
Here are the ones that I've turned up. Any suggestions of better ones, or feedback on the effectiveness of these tools would be useful.
- This is the one that currently seems best to me. It uses the Spread toolkit to handle the communications between clients, a queue manager process, and worker processes.
- PVM seems like a kind of old-school system built for running scientific applications on clusters. It's looking a little long in the tooth, but I have not tried it yet.
- Parallel::MPI and Parallel::MPI::Simple
- MPI is basically a newer replacement for PVM, but still more oriented towards distributed processing for science apps.
- IBM's grid stuff
- I need to look into this one more. It seems to use some more modern protocols (web services) but other than that it's unclear what advantage this might have over PVM and MPI.