P is for Practical | |
PerlMonks |
Re: what do you use for job queuing?by Anonymous Monk |
on Jun 04, 2004 at 21:35 UTC ( [id://361221]=note: print w/replies, xml ) | Need Help?? |
This is something of a 'concept' reply but I hope it helps you clarify
your choice parameters. I also have a 'remote' :) interest in this topic.
The implementation of parallel virtual machines and more pedestrian remote job control
share a lot in common, but they are different inhabitants of the same house. PVM type
message passing is the key to efficient distributed processing and beowulf clustering
where your code is not localised. The engine at the heart of which is really no more than
a scheduler with the added sense to know when it is cost effective to spawn a sub-process
to a new node, or compute it locally.
Fault tolerance, generally quick but flexible handling of delivery time, are desirable
for PVM construction and ability to tunnel ssh or other vpn less so.
The granularity of control in PVMs is fine. Clocks are synchronised and packets sent
are many and small. The whole cluster has become a giant fault tolerant single processor.
Processes that remotely fork pass code as well as data amongst the nodes.
Many scientific algorithms have been optimised for such execution and incorporate the
necessary forking cues.
Job control is less complicated but has its own issues. For the most part you are only
interested in reliable message passing functions, forking procedures and harvesting their
results are not part of the plan. Code is local and specific to the nodes, which often
perform a single well defined task. The message passing is to remotely execute procedures
which already live with their data, so authentication and reliable accounting of the remote
machine state are desirable. This is really distributed control, and because of the timescales
involved, and the high end-to-end reliability you can build atop even email and passing messages
via a pop box is perfectly practical and usable., but monitoring scripts need to operate on a finer
timescale.
Grid computing, which is what IBMs take is about, falls somewhere in the middle, where you are
not building a parallel supercomputer, and you need more than remote job execution.
The Martin Brown (IBM) article is ok imho, it mentions using POE, which I had not
considered, but also SOAP and OGSI frameworks. You could do all this with sockets, but it could
become ugly (as I can attest), and since I am not a user of any of those libraries I can't speak
from experience. My suggestion would be to further study SOAP and OGSI protocols and see
how they help you with data queuing problem. Grid computing takes the desirable feature of PVMs
which is transparrent replication (shrinking and growing of the process pool), and this solves
your requirement of easy admin. You set up clusters once. Machines may come, machines may go, but
the like the axe that has had 8 new handles and 8 new heads, its still the same axe (cluster).
As to your problem. I guessed it had something to do with music before I looked at the ticket site :)
Concert ticketing falls right in there with electronic voting and the stock exchange. You are designing
a system to cope with a one off transient peak in demand. What you need is a spike handler.
Just sticking the requests in a queue is nasty, your users have no reliable way of knowing
availability, and if they don't know their queue position they have no way to know if their purchase
will be honoured. The economics atm are favorable for a solution like you (perhaps unwittingly)
are heading for. Many companies would naively
dupe the site across many boxes rented for a very small amount of time around the product launch
and let Apache mods load balance the spike out. However this is always going to be an order of
magnitude more expensive than an ISP who offers clustering and can take a kiloslash (1000 times
the power of a slashdotting:) and stand up. The actual peak is remarkably short in time.
I think an on demand replication system based on PVM principles
would look nice so I think you are on the right track. Check out the modperl list archives too
I've a feeling there was a thing, Stas or Randal wiuld know, about this in the past, there may well
be an Apache mod-perl solution to just this problem, but event driven from the demand (web request) end.
If there isn't I guess you're writing it :)
In Section
Seekers of Perl Wisdom
|
|