in reply to Receiving data asynchronously in a long-run process

The worker should enqueue the job for processing, then continue processing its current job.

Unless your worker is actually capable of doing more than one job at a time (by, for instance, forking off other processes to do that) there is no reason to fetch jobs ASAP. In the simpler case where jobs are processed sequentially you generally only want to get a new job once the old one it finished. But that will possibly block the issuer if the pipe fills up.

Also, unless you're talking about really simple systems, you generally want to have multiple workers (and possibly multiple issuers) running at the same time, possibly at different machines. There are generic systems for doing the message queueing in that situation, they're called message queues.

The message queue will recieve jobs (messages) and set up routing and queues for multiple receivers as you need. That simplifies the workers to a more or less pure "fetch, process, repeat" loop. Take a look at activemq or rabbitmq for example. (Note: activemq takes some work to configure, but supports STOMP as the protocol which is really straightforward. To understand rabbitmq you need to understand the AMQP specs and use POE - and neither are completely trivial**)

I can only think of one other options to handle this IPC problem; make the worker multithreaded with one pipe management thread and one work thread. This seems like a pretty heavyweight solution to a pretty simple problem.
POE works well for these situations, probably better than threads, though I wouldn't describe it as lightweight.

Also, unless you're comfortable with dropping jobs when crashes occur and running everything on a single machine or thread, your problem isn't simple. :)

edit: **) for now, you may also want my forks of the AMQP modules, here and here (you need both).

  • Comment on Re: Receiving data asynchronously in a long-run process

Replies are listed 'Best First'.
Re^2: Receiving data asynchronously in a long-run process
by Annirak (Novice) on Sep 21, 2009 at 19:54 UTC

    Wow, joost, that's a lot of really great information. My intent was originally to write a library which would work much like Thread::Pool::Simple, so you're absolutely right; I was planning to use multiple workers. I had intended this to be a local-machine-only library, but the possibility of handling off-machine jobs for particularly granular tasks has a lot to recommend it too.

    For a local-machine library, I suspect that activemq is probably heavier than necessary. For a grid computing library, it's probably a great solution. I don't think I need quite that much power, but since I do want to release this library, perhaps the flexibilty is worthwhile.