Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Task orchestrator or distributed state machine

by moritz (Cardinal)
on Jul 09, 2014 at 13:19 UTC ( #1092903=perlquestion: print w/ replies, xml ) Need Help??
moritz has asked for the wisdom of the Perl Monks concerning the following question:

For a $work project we have to define some workflows where individual pieces (henceforth "tasks") run distributed over several machines, and now the big question is: how do we coordinate them?

A typical use case is to run task A, and when it's finished (and successful), run tasks B and C in parallel, and when both are done (and successful), run task D.

The workers will communicate over AMQP (think RabbitMQ).

But we need a piece of software that controls the flow of all theses tasks, and of course I'd like to write in Perl. What existing software could help with that? I think I want some kind of task orchestrator, like a state machine where you can define transitions, forks and joins.

On the task scheduling side, so far I've found Minion. It looks promising, but is very light on high-level documentation. Also while it seems to support events on failed and finished jobs, it doesn't offer any further help with the orchestration. Also no AMQP support, but then I didn't find any perl-based task queues/schedulers that use AMQP.

The state machine side looks pretty bleak. Machine::State and State::Machine both allow only one state, and no joins/forks.

Can you recommend any modules or tools that will help me with coordinating those tasks?

Update: It seems like I'm looking for something like TaskFlow, only in Perl.

Comment on Task orchestrator or distributed state machine
Re: Task orchestrator or distributed state machine
by salva (Monsignor) on Jul 09, 2014 at 13:37 UTC
    From the high level, Net::OpenSSH::Parallel does mostly what you are asking for.

    The issues you may find, besides being SSH based, is that it does not allows distributing a set of N works over a set of M workers. The relation is always N:N.

    In other words, you can say, run this task, then this, then this on every host from this set, then join, then run this, etc.

    But you can not say, spread this set of tasks over this set of hosts, then join, then ...

Re: Task orchestrator or distributed state machine
by salva (Monsignor) on Jul 09, 2014 at 13:45 UTC
    Regarding state machines supporting forking and joining, at the implementation level that is better done using multiple objects, one by fork.
Re: Task orchestrator or distributed state machine
by kschwab (Priest) on Jul 09, 2014 at 14:01 UTC

    The type of software you're describing is usually called a "batch scheduler", or more recently, "workload automation".

    Some references that might help:

    A perl module that I haven't used, but looks related:

    BatchSystem::SBS

    What you're asking for also has a lot of parallels with testing frameworks, as well as grid/cloud management, so you might find something in one of those spaces that works.

      Also, while it doesn't support many of the things you asked for, gearman is a sort of generic task/job management framework that has a perl api.

      Here's an example of the perl api.

Re: Task orchestrator or distributed state machine
by perlfan (Curate) on Jul 09, 2014 at 14:12 UTC
    I don't know what existing Perl tools to suggest for you; perhaps you could see if there were any Perlish ways of interfacing with Apache Zoo or similar engines.
Re: Task orchestrator or distributed state machine
by DMR (Initiate) on Jul 10, 2014 at 11:37 UTC

    My colleagues use eHive for this sort of problem. (Link was seen as spam - try http://www.ensembl.org/info/docs/eHive/index.html)

    It needs a database (mysql or sqlite) and a job submission system (e.g. LSF or SGE). Documentation is a little light, but it can handle the sorts of workflows you describe.

Re: Task orchestrator or distributed state machine
by choroba (Abbot) on Jul 10, 2014 at 17:05 UTC
    What about Workflow? I only remember I once tried to understand its documentation, but wasn't able at that time.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Task orchestrator or distributed state machine
by moritz (Cardinal) on Jul 17, 2014 at 14:30 UTC

    Thank you all for your answers, and sorry for the late feedback.

    While there are many interesting modules (in particular, Workflow led me to Class::Workflow), none of them quite fit the bill, either they are tied to a specific backend that isn't AMQP (Net::OpenSSH::Parallel), or they don't handle parallel workflows.

    I'm not yet sure what I'll do about this whole situation.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1092903]
Approved by marto
Front-paged by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-09-23 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (240 votes), past polls