Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

what do you use for job queuing?

by perrin (Chancellor)
on Jun 04, 2004 at 15:53 UTC ( #360955=perlquestion: print w/replies, xml ) Need Help??

perrin has asked for the wisdom of the Perl Monks concerning the following question:

I'm going to talking at YAPC in a couple of weeks about techniques for website scalability. One of the things I'll be discussing is the use of job queuing to handle large numbers of requests for a limited resource. My main example will be what Ticketmaster does on their site. However, the Ticketmaster code for this is tightly coupled to their backend systems and is thus not open source. I want to give people some open source examples to look at, so I'm wondering what others have been using.

Ideally I am looking for something with these characteristics:

  • works across a cluster of machines
  • relatively easy to administer
  • provides control over how many jobs are running at once
  • allows asynchronous calls, i.e. a client sends in a request and then can check on the status and eventually the results later on
Here are the ones that I've turned up. Any suggestions of better ones, or feedback on the effectiveness of these tools would be useful.
This is the one that currently seems best to me. It uses the Spread toolkit to handle the communications between clients, a queue manager process, and worker processes.
PVM seems like a kind of old-school system built for running scientific applications on clusters. It's looking a little long in the tooth, but I have not tried it yet.
Parallel::MPI and Parallel::MPI::Simple
MPI is basically a newer replacement for PVM, but still more oriented towards distributed processing for science apps.
IBM's grid stuff
I need to look into this one more. It seems to use some more modern protocols (web services) but other than that it's unclear what advantage this might have over PVM and MPI.

Replies are listed 'Best First'.
Re: what do you use for job queuing?
by eduardo (Curate) on Jun 04, 2004 at 16:30 UTC
    I've used a sendmail / IMAP setup for this before. It's an excellent way of providing a persistent distributed routable queuing architecture, with a well known and documented API for handling complex messaging. A lot of people give you quizzical looks when you explain to them how to set this sort of thing up, but then again, it's how "job queuing" is done if the endpoints are people, so why wouldn't it scale splendidly to programs? Assign different processes different email addresses, write a procmail file writer to help you do load balancing if need be accross a cluster Built right in you get point to point and message list distribution, even request receipts and complex attachment hierarchies if you so chose :) It's my favorite solution to this particular problem.
      I agree that e-mail offers some attractive features in terms of queueing messages and handling temporary delivery failures robustly, and I have heard people talk about doing it this way before. I would say it mostly just solves the message queue issue though.

      It might have issues with frequent polling, just like a database-backed queue would, although I suppose an e-mail server is built to do that well. It would also be difficult to get the current status of a job, but maybe "received" is good enough for most cases.

      I think you would end up needing to build some extra infrastructure to allow a client to check for the response to a specific job, or is there a simple way to do that with IMAP queries and subject lines or something? Also, how would you handle distributing messages across a set of worker processes? Would you use e-mail for that too, sending messages on from a dispatcher to specific workers with unique e-mail addresses?

Re: what do you use for job queuing?
by Itatsumaki (Friar) on Jun 04, 2004 at 16:03 UTC

    If I understand the topic correctly, you might want to consider the "trivial" solution of setting up a dispatch table in a database and running monitoring scripts on each machine in the cluster. Each monitor-script would check for new work to process in the database, and update the dispatch table with "in-progress" and "completed" flags as need be. Of course this only works when the individual pieces can easily be broken apart, but I think this is often true for web-applications.

      I did think about that, and still might try building one as an example app, but there are some tricky implementation details. For one thing, this could result in a whole lot of programs polling the database frequently. That's not going to scale all that well. We can reduce this by having a dispatcher process on each machine so that there's only one poller per machine, but then it has to handle talking to a bunch of child processes to give them jobs and get their results. (This is trickier than a typical server situation where the children just wait for the next connection to come in on some socket.) That can be done with things like IPC::Run or maybe POE, but it's not trivial.

      If I end up trying to build one, I will probably post some design ideas here for feedback.

Re: what do you use for job queuing?
by monkey_boy (Priest) on Jun 04, 2004 at 16:27 UTC
    We use Sun Grid Engine , which i think is now available for free download as they have a new product for clustered computing.
    I cant say it has the most elegent interface , but it gets the job done.

    I should really do something about this apathy ... but i just cant be bothered
Re: what do you use for job queuing?
by exussum0 (Vicar) on Jun 04, 2004 at 16:45 UTC
    MPI is newer, but I'm not sure how widley used it is compared to other technologies today. Further more, it's more a definition and API to transmitting data just as DOM and XML are... Or so I understand it.

    Spread, as I've seen it, does synchronization of messages on top of communicating them, which is great to talk about.

    Maybe you can discuss the various patterns used as well involved in such a system. There are many distributed programming texts about. Maybe implement something ad hoc demonstrating the use of priority queues, mutex locking and the likes?

    Bart: God, Schmod. I want my monkey-man.

      I have limited time for this talk (about 40 minutes) and this isn't my only subject, so I can't go into a really lengthy discussion about it. However, if I do end up writing one, I will probably try to lean on a networked database for locking and general concurrency issues. I was trying to think of ways to avoid the incessant polling that could happen with a central-server approach like that (every worker process checking for new jobs, etc.). Let me know if you have any ideas about that. It may not really be worth worrying about, since these would be pretty simple queries, but I remember how much the Oracle DBAs at one place where I used to work complained about ATG Dynamo's RDBMS-based implementation of JMS. It would hit the database constantly.
•Re: what do you use for job queuing?
by merlyn (Sage) on Jun 04, 2004 at 17:42 UTC
Re: what do you use for job queuing?
by Anonymous Monk on Jun 04, 2004 at 21:35 UTC
    This is something of a 'concept' reply but I hope it helps you clarify your choice parameters. I also have a 'remote' :) interest in this topic.

    The implementation of parallel virtual machines and more pedestrian remote job control share a lot in common, but they are different inhabitants of the same house. PVM type message passing is the key to efficient distributed processing and beowulf clustering where your code is not localised. The engine at the heart of which is really no more than a scheduler with the added sense to know when it is cost effective to spawn a sub-process to a new node, or compute it locally. Fault tolerance, generally quick but flexible handling of delivery time, are desirable for PVM construction and ability to tunnel ssh or other vpn less so. The granularity of control in PVMs is fine. Clocks are synchronised and packets sent are many and small. The whole cluster has become a giant fault tolerant single processor. Processes that remotely fork pass code as well as data amongst the nodes. Many scientific algorithms have been optimised for such execution and incorporate the necessary forking cues.

    Job control is less complicated but has its own issues. For the most part you are only interested in reliable message passing functions, forking procedures and harvesting their results are not part of the plan. Code is local and specific to the nodes, which often perform a single well defined task. The message passing is to remotely execute procedures which already live with their data, so authentication and reliable accounting of the remote machine state are desirable. This is really distributed control, and because of the timescales involved, and the high end-to-end reliability you can build atop even email and passing messages via a pop box is perfectly practical and usable., but monitoring scripts need to operate on a finer timescale.

    Grid computing, which is what IBMs take is about, falls somewhere in the middle, where you are not building a parallel supercomputer, and you need more than remote job execution. The Martin Brown (IBM) article is ok imho, it mentions using POE, which I had not considered, but also SOAP and OGSI frameworks. You could do all this with sockets, but it could become ugly (as I can attest), and since I am not a user of any of those libraries I can't speak from experience. My suggestion would be to further study SOAP and OGSI protocols and see how they help you with data queuing problem. Grid computing takes the desirable feature of PVMs which is transparrent replication (shrinking and growing of the process pool), and this solves your requirement of easy admin. You set up clusters once. Machines may come, machines may go, but the like the axe that has had 8 new handles and 8 new heads, its still the same axe (cluster).

    As to your problem. I guessed it had something to do with music before I looked at the ticket site :) Concert ticketing falls right in there with electronic voting and the stock exchange. You are designing a system to cope with a one off transient peak in demand. What you need is a spike handler. Just sticking the requests in a queue is nasty, your users have no reliable way of knowing availability, and if they don't know their queue position they have no way to know if their purchase will be honoured. The economics atm are favorable for a solution like you (perhaps unwittingly) are heading for. Many companies would naively dupe the site across many boxes rented for a very small amount of time around the product launch and let Apache mods load balance the spike out. However this is always going to be an order of magnitude more expensive than an ISP who offers clustering and can take a kiloslash (1000 times the power of a slashdotting:) and stand up. The actual peak is remarkably short in time. I think an on demand replication system based on PVM principles would look nice so I think you are on the right track. Check out the modperl list archives too I've a feeling there was a thing, Stas or Randal wiuld know, about this in the past, there may well be an Apache mod-perl solution to just this problem, but event driven from the demand (web request) end. If there isn't I guess you're writing it :)

    Best of Luck

Re: what do you use for job queuing?
by andyf (Pilgrim) on Jun 07, 2004 at 06:32 UTC
    Perrin, a penny just dropped, are you aka Perrin Harkins? If so I hope you found it funny rather than insulting that I refer you to the mod perl list. :)
      Yes, that would be me. I'm not actually implementing, just talking about what my friends there built. They have a queuing system which provides status information so you can tell your place in line.

      P.S. Send me your Unreal nickname and I'll look for you next time I play.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://360955]
Approved by tinita
Front-paged by Itatsumaki
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2020-07-13 06:18 GMT
Find Nodes?
    Voting Booth?

    No recent polls found