http://www.perlmonks.org?node_id=878047
markseger has asked for the wisdom of the Perl Monks concerning the following question:

I think this is a threads question but will defer to your wisdom...

I have a script that wants to execute commands on multiple remote systems and process their output. I decomposed it into a very simply script that I hope makes my point using a simple 'cat' command and a test file in /tmp.

#!/usr/bin/perl -w my @FD; my @hosts=('poker', 'poker'); for (my $i=0; $i<@hosts; $i++) { my $a="ssh $hosts[$i] cat /tmp/test"; open $FD[$i], "$a|" or die; } for (my $i=0; $i<@hosts; $i++) { my $fd=$FD[$i]; my $line=<$fd>; print "LINE: $line\n"; }
As you can see I first loop through all the node names, which I've hard coded to be the same system for the sake of demonstration, and execute the command by opening it in a pipe. In the second section I'm simply looking at the first line of output but this would actually much more complex.

This all works just fine but I'm concerned with scaling. I tried running in on a couple of hundred systems and each ssh command is executing serially and I thought if I threw in an & it would make things asynchronous but I think the open is waiting on a socket connection to be established with the pipe.

It seemed to me if I could fire off each open in a separate thread, they'd all run in parallel and run much faster. The thing is from my reading and playing around with threads I think one can only share simply arrays and hashes and I believe file descriptors are more complex.

So the question is do I really needs threads to solve this problem and if so how, OR is there just some faster way to get the pipes to open? Ultimately I'd like to see this be able to run on several thousand machines and not take forever to get past the initial opens.

-mark