Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Do I need threads?

by TechFly (Scribe)
on Dec 21, 2011 at 21:37 UTC ( [id://944700]=note: print w/replies, xml ) Need Help??


in reply to Do I need threads?

Thank you all. From the advice I got here, I looked closer at fork(), and found it is a perfect fit for what I want. I did some test scripts, and found that I can fork() and wait for the return while using the children to all print their output to a common (opened before the fork) file with the first line the PID of the process. Then I can sort it to get the output from each and check for errors.

This approach has proven to be simple (complicated until I got a little more familiar) and very effective. The result is that I can run all of them at once and wait for the response.

In case anyone wanted to see, this is the test script. The file that it opens contains a list of servers to hit, and the script just echos the host name followed by a 30 second wait both executed on the remote host.

#!/usr/bin/perl use strict; use warnings; use Net::SSH qw(ssh); my %serverPid; my ($testkey, $testvalue); die("Could not open file list: $!") unless open(FHServerList, "<", "./servers"); die("Could not open log file: $!") unless open(FHlog, ">>", "./log"); while(<FHServerList>) { my $serverName = $_; chomp($serverName); my $pid = fork(); if($pid == '0'){ #Child process print FHlog (ssh($serverName, "hostname; sleep 30")); exit(0) } else { $serverPid{$serverName} = $pid; } } while(($testkey, $testvalue) = each %serverPid) { waitpid($testvalue, 0); print ("Done with $testkey\n"); } close(FHServerList); exit()

It is not super clean (it will implode on itself if the fork fails), but it is functional, and will get cleaned up as I work on the full 'real' script.

Thanks again for the help guys.

Replies are listed 'Best First'.
Re^2: Do I need threads?
by Marshall (Canon) on Dec 22, 2011 at 00:55 UTC
    You are almost there. This code will work most of the time. But that is not "all of the time".

    The issue is how you handle multiple children writing to the log file. A context switch can happen at any time including during the middle of a child doing a write to the log file! The result is that occasionally you will get garbled data in the log file.. maybe child 1 starts writing a line...then child 2 starts running and writes its stuff right in the middle of the line from child 1, then child 1 runs again to finish its line. Result garbled stuff.

    To fix this, the children have to cooperate so that only one at a any one time is writing to the file. You can just have the children acquire an exclusive file lock, do the write, then release the lock. A blocking wait for the exclusive lock is ok here - you don't need to fiddle with a shared reading lock. See: file locking for more details on how to do it.

    Also there is another way to wait for the children, wait(). What you have is fine, but you can just wait() (blocking) for all of the children to finish. That way you wouldn't have to keep track of the pid's of the children. The OS knows who your children are.

    Anyway, the main functional issue is using flock() to coordinate the children. Your writes are quick and there isn't that many of them so it might be awhile before you see the problem, but it will eventually happen.

      After reading the article there, I don't think that locking is going to help me in the long run. The idea is to speed things up, but it appears that this may introduce a lag while the output is being written, and the next process is waiting to write. That being said, I could be wrong. It may just cache the data being sent, and not written to the log without holding up the script generating the data. I will have to play with that later.

      Here is what I did. Instead of writing the output from ssh to the log directly, I wrote it to a temp log, one for each thread using the captured server names for part of the file name. Then at the end of the script, I dump them all back into the main log. Still a lot of cleanup to do, but I think I am on the right track now. Thank you for pointing this out. I have no doubt you have saved me hours of confusion, and frustration!

      Here is something that I missed earlier, the output being sent to the log in the fork is actually only a success or fail of the SSH call. I decided to just redirect STDOUT to the log so that I can capture the output of the command itself.

      #!/usr/bin/perl use strict; use warnings; use Net::SSH qw(ssh); my %serverPid; my ($testkey, $testvalue); die("Could not open file list: $!") unless open(FHServerList, "<", "./servers"); die("Could not open log file: $!") unless open(FHlog, ">>", "./log"); while(<FHServerList>) { my $serverName = $_; chomp($serverName); my $pid = fork(); if($pid == '0'){ #Child process #open(FHTempLog, ">", "/tmp/$serverName.tmp"); open(STDOUT, '>', "/tmp/$serverName.tmp") or die ("Cannot crea +te temp log file for $serverName: $!"); open(STDERR, ">&STDOUT"); ssh($serverName, "for x in 1 2 3 4 5 6; do echo -n \$x; hostna +me; done"); exit(0) } else { $serverPid{$serverName} = $pid; } } while(($testkey, $testvalue) = each %serverPid) { waitpid($testvalue, 0); print ("Done with $testkey\n"); } seek(FHServerList, 0, 0); while(<FHServerList>) { chomp($_); my $tmpFileName = ("/tmp/$_" . ".tmp"); open(FHtempFile, "<", $tmpFileName) or die "Could not open tempfil +e: $!"; while(<FHtempFile>) { print FHlog $_; } close(FHtempFile); } close(FHServerList); exit()
        RE: time lag... Yes, acquiring a lock does introduce a very slight time lag...but this is very, very fast. The OS keeps track who who has what lock in a memory resident structure. To use this method, you should not acquire the lock until you are actually ready to write. That means to capture the output in a memory variable and then lock,write,unlock in quick succession rather than having the lock for the entire time that the sub-process is running. It is possible to sequence even very high I/O rates with this method because the time for each write is negligible.

        Having each child use its own individual file and then "cat" them together when all of them are finished is another way and is the way I'd do it if I was launching these tasks in the background via a shell script.

        It looks like you are using method 2, which is fine. Either way will work for your application.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://944700]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2024-04-23 12:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found