Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Do I need threads?

by TechFly (Scribe)
on Dec 20, 2011 at 20:48 UTC ( [id://944470]=perlquestion: print w/replies, xml ) Need Help??

TechFly has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a script to simplify and speed up distributing code to a bunch of web servers. I need it to be quicker. The current shell script basically just calls cvs on the web servers and says 'update'. It does them one at a time, and it takes quite a long time.

I want to use a perl call to do the same basic thing, but here is my question. Is there a way to fire the command, but not wait for it, and still get the output of the command? I know I can fire it, and not wait, or fire it and get the output but also wait. Can I get the best of both, or do I need to delve into threading, and create a thread for each server?

Sorry there is no code, but this is more of a 'not sure where to start' sort of question.

Replies are listed 'Best First'.
Re: Do I need threads?
by 1arryb (Acolyte) on Dec 20, 2011 at 21:42 UTC

    Hi, TechFly,

    No. As usual, there's more than one way to do it. In addition to threads (which I've never used in Perl), I can think of several:

    1. The lo-tech way: Use backgrounded system calls that append to a single logfile then tail the log from another terminal.
    2. Classic IPC using fork() and exec() and open3() as described in the perlipc perldoc.
    3. Same as 2), but let a Perl module like IPC::PerlSSH::Async do the heavy lifting for you.

    Cheers,

    Larry

      This is exactly what I needed. Thank you so much. I have not done a ton of perl, so I didn't know all of my options. This helps a lot.

Re: Do I need threads?
by BrowserUk (Patriarch) on Dec 20, 2011 at 21:11 UTC

    You don't necessarily need threads, but you can certainly use them to good effect.

    use threads; ## start the first command going my $t1 = async { return qx[ ... ]; }; ## Start the second command running. my $t2 = async { return qx[ ... ]; }; ## wait for the first to finish and retrieve the output my $output1 = $t1->join; ## wait for the second to finish and retrieve the output my $output2 = $t2->join;

    Doing that in a loop rather than separately is quite trivial.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Do I need threads?
by cavac (Parson) on Dec 20, 2011 at 21:10 UTC

    You could also start the existing slightly modified shell script as background task. One per server. So they run in parallel.

    Just make a shell or perl script that starts the existing code in a for loop one after another as background tasks. Most code is reused, the least work has to be done. And you don't have to worry about getting multithreading right since you don't actually multithread.

    BREW /very/strong/coffee HTTP/1.1
    Host: goodmorning.example.com
    
    418 I'm a teapot

      I could, but it still has the problem that I don't get the output in the main script. I really need to see the output so that I can parse for errors.

Re: Do I need threads?
by JavaFan (Canon) on Dec 20, 2011 at 23:44 UTC
    You can use threads. You can use forks. You can use select loops. You can use a module (or a set of related modules) to do async I/O.

    Sorry, there's no clear answer, but this is more a "not sure what you want" sort of answer.

Re: Do I need threads?
by TechFly (Scribe) on Dec 21, 2011 at 21:37 UTC

    Thank you all. From the advice I got here, I looked closer at fork(), and found it is a perfect fit for what I want. I did some test scripts, and found that I can fork() and wait for the return while using the children to all print their output to a common (opened before the fork) file with the first line the PID of the process. Then I can sort it to get the output from each and check for errors.

    This approach has proven to be simple (complicated until I got a little more familiar) and very effective. The result is that I can run all of them at once and wait for the response.

    In case anyone wanted to see, this is the test script. The file that it opens contains a list of servers to hit, and the script just echos the host name followed by a 30 second wait both executed on the remote host.

    #!/usr/bin/perl use strict; use warnings; use Net::SSH qw(ssh); my %serverPid; my ($testkey, $testvalue); die("Could not open file list: $!") unless open(FHServerList, "<", "./servers"); die("Could not open log file: $!") unless open(FHlog, ">>", "./log"); while(<FHServerList>) { my $serverName = $_; chomp($serverName); my $pid = fork(); if($pid == '0'){ #Child process print FHlog (ssh($serverName, "hostname; sleep 30")); exit(0) } else { $serverPid{$serverName} = $pid; } } while(($testkey, $testvalue) = each %serverPid) { waitpid($testvalue, 0); print ("Done with $testkey\n"); } close(FHServerList); exit()

    It is not super clean (it will implode on itself if the fork fails), but it is functional, and will get cleaned up as I work on the full 'real' script.

    Thanks again for the help guys.

      You are almost there. This code will work most of the time. But that is not "all of the time".

      The issue is how you handle multiple children writing to the log file. A context switch can happen at any time including during the middle of a child doing a write to the log file! The result is that occasionally you will get garbled data in the log file.. maybe child 1 starts writing a line...then child 2 starts running and writes its stuff right in the middle of the line from child 1, then child 1 runs again to finish its line. Result garbled stuff.

      To fix this, the children have to cooperate so that only one at a any one time is writing to the file. You can just have the children acquire an exclusive file lock, do the write, then release the lock. A blocking wait for the exclusive lock is ok here - you don't need to fiddle with a shared reading lock. See: file locking for more details on how to do it.

      Also there is another way to wait for the children, wait(). What you have is fine, but you can just wait() (blocking) for all of the children to finish. That way you wouldn't have to keep track of the pid's of the children. The OS knows who your children are.

      Anyway, the main functional issue is using flock() to coordinate the children. Your writes are quick and there isn't that many of them so it might be awhile before you see the problem, but it will eventually happen.

        After reading the article there, I don't think that locking is going to help me in the long run. The idea is to speed things up, but it appears that this may introduce a lag while the output is being written, and the next process is waiting to write. That being said, I could be wrong. It may just cache the data being sent, and not written to the log without holding up the script generating the data. I will have to play with that later.

        Here is what I did. Instead of writing the output from ssh to the log directly, I wrote it to a temp log, one for each thread using the captured server names for part of the file name. Then at the end of the script, I dump them all back into the main log. Still a lot of cleanup to do, but I think I am on the right track now. Thank you for pointing this out. I have no doubt you have saved me hours of confusion, and frustration!

        Here is something that I missed earlier, the output being sent to the log in the fork is actually only a success or fail of the SSH call. I decided to just redirect STDOUT to the log so that I can capture the output of the command itself.

        #!/usr/bin/perl use strict; use warnings; use Net::SSH qw(ssh); my %serverPid; my ($testkey, $testvalue); die("Could not open file list: $!") unless open(FHServerList, "<", "./servers"); die("Could not open log file: $!") unless open(FHlog, ">>", "./log"); while(<FHServerList>) { my $serverName = $_; chomp($serverName); my $pid = fork(); if($pid == '0'){ #Child process #open(FHTempLog, ">", "/tmp/$serverName.tmp"); open(STDOUT, '>', "/tmp/$serverName.tmp") or die ("Cannot crea +te temp log file for $serverName: $!"); open(STDERR, ">&STDOUT"); ssh($serverName, "for x in 1 2 3 4 5 6; do echo -n \$x; hostna +me; done"); exit(0) } else { $serverPid{$serverName} = $pid; } } while(($testkey, $testvalue) = each %serverPid) { waitpid($testvalue, 0); print ("Done with $testkey\n"); } seek(FHServerList, 0, 0); while(<FHServerList>) { chomp($_); my $tmpFileName = ("/tmp/$_" . ".tmp"); open(FHtempFile, "<", $tmpFileName) or die "Could not open tempfil +e: $!"; while(<FHtempFile>) { print FHlog $_; } close(FHtempFile); } close(FHServerList); exit()
Re: Do I need threads?
by sundialsvc4 (Abbot) on Dec 21, 2011 at 00:21 UTC

    It just seems to me, superficially, that you could command all of them to do a cvs checkout at the same time and then to send any error output to you.   (Better yet, program each of them to evaluate whether or not they encountered an error.)   Don’t oblige them to do in single-file what can be done in parallel . . .

Re: Do I need threads?
by TJPride (Pilgrim) on Dec 21, 2011 at 05:55 UTC
    delay.pl:
    use strict; use warnings; die "Can't open $ARGV[1] for write.\n" if !open(FH, ">>$ARGV[1]"); print FH "$$) Doing something with $ARGV[0]\n"; sleep rand 10; print FH "$$) Finished running.\n";

    wrapper.pl:

    use strict; use warnings; my $log = 'logfile.txt'; unlink $log if -e $log; for (1..10) { system("perl delay.pl $_ $log &"); }

    logfile.txt (after random run):

    395) Doing something with 1 395) Finished running. 401) Doing something with 4 401) Finished running. 403) Doing something with 5 403) Finished running. 413) Doing something with 10 413) Finished running. 399) Doing something with 3 399) Finished running. 407) Doing something with 7 407) Finished running. 409) Doing something with 8 409) Finished running. 397) Doing something with 2 397) Finished running. 405) Doing something with 6 405) Finished running. 411) Doing something with 9 411) Finished running.

      How does this answer the OPs requirement to obtain the output from concurrent runs of a preexisting executable?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        An interesting question.   My interpretation of the requirement was that “at the end of the day, we want 10 servers to have updated themselves,” and so, my thought is to:   first, immediately fire a command to every one of them at once, asking every one of them at once to cvs update themselves and then to send back a notification, of success or of failure, by some appropriate means.   The script that issued all of the simultaneous commands, without having waited for any of them to complete, then simply waits for 10 final-status messages to arrive.   You’re just waiting for your proverbial mailbox to fill up, and heck, maybe you literally use e-mail to do it.

        Obviously, this approach would create quite a load (so to speak...) on the internal network as every one of the servers attempted to do a checkout at precisely the same time.   But that “quite a load” might actually be quite reasonable.

        Now, having said that, yes this is clearly also a task that could be handled by forking a bunch of shell scripts ... you could even literally do the job in the shell using facilities like '&' on the command-line.   Because the various forked processes are just issuing a command and then loafing off, sipping mint juleps while waiting for the remote machine to do its work.   There is, as they say, TMTOWTDI™ in this case, all of them rather uncomplicated.

        No matter how you decide to tackle it, if the approach you are considering feels complicated (not just “unfamiliar”), then there is probably an easier way to do it... that ought to be the litmus-test.

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://944470]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-03-19 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found