Re: what do you use for job queuing?

This is something of a 'concept' reply but I hope it helps you clarify your choice parameters. I also have a 'remote' :) interest in this topic.

The implementation of parallel virtual machines and more pedestrian remote job control share a lot in common, but they are different inhabitants of the same house. PVM type message passing is the key to efficient distributed processing and beowulf clustering where your code is not localised. The engine at the heart of which is really no more than a scheduler with the added sense to know when it is cost effective to spawn a sub-process to a new node, or compute it locally. Fault tolerance, generally quick but flexible handling of delivery time, are desirable for PVM construction and ability to tunnel ssh or other vpn less so. The granularity of control in PVMs is fine. Clocks are synchronised and packets sent are many and small. The whole cluster has become a giant fault tolerant single processor. Processes that remotely fork pass code as well as data amongst the nodes. Many scientific algorithms have been optimised for such execution and incorporate the necessary forking cues.

Job control is less complicated but has its own issues. For the most part you are only interested in reliable message passing functions, forking procedures and harvesting their results are not part of the plan. Code is local and specific to the nodes, which often perform a single well defined task. The message passing is to remotely execute procedures which already live with their data, so authentication and reliable accounting of the remote machine state are desirable. This is really distributed control, and because of the timescales involved, and the high end-to-end reliability you can build atop even email and passing messages via a pop box is perfectly practical and usable., but monitoring scripts need to operate on a finer timescale.

Grid computing, which is what IBMs take is about, falls somewhere in the middle, where you are not building a parallel supercomputer, and you need more than remote job execution. The Martin Brown (IBM) article is ok imho, it mentions using POE, which I had not considered, but also SOAP and OGSI frameworks. You could do all this with sockets, but it could become ugly (as I can attest), and since I am not a user of any of those libraries I can't speak from experience. My suggestion would be to further study SOAP and OGSI protocols and see how they help you with data queuing problem. Grid computing takes the desirable feature of PVMs which is transparrent replication (shrinking and growing of the process pool), and this solves your requirement of easy admin. You set up clusters once. Machines may come, machines may go, but the like the axe that has had 8 new handles and 8 new heads, its still the same axe (cluster).

As to your problem. I guessed it had something to do with music before I looked at the ticket site :) Concert ticketing falls right in there with electronic voting and the stock exchange. You are designing a system to cope with a one off transient peak in demand. What you need is a spike handler. Just sticking the requests in a queue is nasty, your users have no reliable way of knowing availability, and if they don't know their queue position they have no way to know if their purchase will be honoured. The economics atm are favorable for a solution like you (perhaps unwittingly) are heading for. Many companies would naively dupe the site across many boxes rented for a very small amount of time around the product launch and let Apache mods load balance the spike out. However this is always going to be an order of magnitude more expensive than an ISP who offers clustering and can take a kiloslash (1000 times the power of a slashdotting:) and stand up. The actual peak is remarkably short in time. I think an on demand replication system based on PVM principles would look nice so I think you are on the right track. Check out the modperl list archives too I've a feeling there was a thing, Stas or Randal wiuld know, about this in the past, there may well be an Apache mod-perl solution to just this problem, but event driven from the demand (web request) end. If there isn't I guess you're writing it :)

Best of Luck
Andy

Comment on Re: what do you use for job queuing?


P is for Practical
	PerlMonks