Tuna has asked for the wisdom of the Perl Monks concerning the following question:

I'm in the home stretch with this cluster tool that I am writing. I've encountered a puzzling problem. I'll try to visualize it for you.

My tool basically does this: Cluster 1 => Host 1 (execute task sequence) Host 2 (execute task sequence) Host 3 (execute task sequence) Host 4 (execute task sequence) Host 5 (execute task sequence) Host 6 (execute task sequence) Host 7 (execute task sequence) Cluster 2 | | | | v Cluster 5


Foreach cluster:
- If less than 2 hosts successfully reload, enter &rollbackSequence.
- If host 7 fails, enter &rollbackSequence.
- IMPORTANT - I need to keep tabs of each host that has been processed; in the event that I need to rollback, I need to know which hosts to rollback!

So, I'm thinking:

my $HostsPerCluster{$clusterNumber} = scalar(@clusterHosts); my $totalClusterHosts = $HostsPerCluster{$clusterNumber} my $failures = 0; my $host_7 = $clusterHosts[-1]; ### this sub is called in a looping statement &do_stuff( $host, $file); if ($totalClusterHosts - $failures < 2) { ### exit this loop and do other stuff } sub do_stuff { my $command = "/bin/true"; &sshExec('SESSION', $command); $? && ++$failures; return $success; }

So, as you can see, I'm a little stuck. I *think* that the above code might work. (can't test b/c our company ssh gateway is down). Additionally, I have no idea how to track each host, in the event that I need to rollback. Nor, do I have any idea how to proceed determining the success or failure of host 7.

I know that I've been somewhat of a leech lately, but I really do try to resolve this stuff myself, before posting here.

TIA, once again

Replies are listed 'Best First'.
Re: Pass/Fail Counter
by little (Curate) on Aug 19, 2001 at 17:16 UTC
    Actually what I understand that you have is a group of clusters which can be referred to as a fram or a cluster itself. All units or devices in a cluster (which can be hosts or clusters or both mixed) are controlled by a special unit or device which we can call the controller.

    So regarding your previuos node I suggest you change once more the structure of your data to somehow represent the relations inbetween all of your hosts. That can be very simple. (I know that others said something alike before.)

    You said you have a cluster (or a farm) consisting of 5 clusters, so that is the "Queen" as kjherron called it. This is the number 1. It has no parent. (Well,except you of course, but sadly you don't count here. grin) But as all of the devices must have a parent we will make it its own parent. A very special case, but that will not hurt nor break anything as this is the ONLY one that is parent of itself.

    It is supposed to wait for all the "workers", so the units / devices it consists of. So we could call them children which have number 1 as their parent.

    Each unit needs to know its parent and it can have only one parent to keep the relations easy.

    This leads to:
    ID; parentID; waitForID;  unitName;       type;           description
     1;        1;         1;      farm;    cluster;          Queen of all
     2;        1;         2; cluster_1;    cluster;                   ...
     3;        2;         3;        h7; controller; princess of cluster_1
     4;        3;         0;        h1;       host;                worker
     5;        3;         0;        h2;       host;                worker

    But ok, it might be better to normalize these relations so

    As bbfu stated out here it would cost to much time to alsways search for all units with a specific parentID to find the children of a cluster or a controller. So it might be usefull to let each unit know its parent and its children. But that can be done in the way as Nitsuj suggested and populating values for $obj->parent and $obj->children aside something alike $obj->waitfor upon initialization of each object. The waitfor is interesting as it enables you to let a unit wait for any of the other units you have. This way you can set the order of processing the tasks.

    Another point would then be to implement $obj->work returning 1 for success or 0 for failure and $obj->result.

    I am just wondering if it would be somewhat an effort to call a script on each host represented by ($obj->type eq "host") that contacts the assigned controller in a way alike "i am done" or using or a DIE_Handler "i did not what you wanted me to do". So you don't need to have a persistent connection to that host but then have to worry if no reply comes at all, eg. in case of a system error, but thats a very different problem or at least on a different level though.

    I know this is all very general but forgive my laziness as you already got "tons" of good code in reply to this and your previos posts regarding this topic.

    And for me it's not easy to type with one entire arm in cast. (yeah, never again break to hard for pedestrians when you're on a bicycle :-)

    Have a nice day
    All decision is left to your taste