http://www.perlmonks.org?node_id=791996

incognito129 has asked for the wisdom of the Perl Monks concerning the following question:

In my script I currently run a series of system commands with backtick sequentially but altogether they take way to long and I'd like to shorten the time it takes. Is there any way of running multiple commands at once and retrieving their output?
  • Comment on running multiple system commands in parallel

Replies are listed 'Best First'.
Re: running multiple system commands in parallel
by BrowserUk (Patriarch) on Aug 28, 2009 at 20:12 UTC

    If you like simple, then where you would code:

    #! perl -slw use strict; #... my $output1 = `somecommand`; my $output2 = `someothercommand`; #... print for $output1, $output2;

    Do:

    #! perl -slw use strict; use threads; #... my $t1 = async{ `perl -e"print, sleep 1 for 1 ..10"` }; my $t2 = async{ `perl -le"print, sleep 1 for 'a'..'z'"` }; #... my $output1 = $t1->join; my $output2 = $t2->join; print for $output1, $output2;

    (Note:The second snippet is an actual working example!)

    And er...that's it really. It requires a little more if you have lots of commands to run and reason to limit the level of concurrency. But not much.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      This works great and so unbelievably simple!
Re: running multiple system commands in parallel
by doom (Deacon) on Aug 28, 2009 at 19:58 UTC

    This is one of my old postings, where I talk about the basics of running child processes, and I present a code example where a child communicates back to the parent: Re: Process Management

    You ought to be able to extend this to run multiple children, where the parent continues to watch them all for input.

Re: running multiple system commands in parallel
by shmem (Chancellor) on Aug 28, 2009 at 20:18 UTC
Re: running multiple system commands in parallel
by walkingthecow (Friar) on Aug 28, 2009 at 19:48 UTC
    You could use the code below, which you would use to call the multiple scripts, each with the command that you'd like output from. Have the script that is being called print the output to STDOUT, and the script below will then print the output (or you can modify it to do whatever you need with the output).
    #!/usr/bin/perl use strict; use warnings; use Getopt::Long; use Parallel::ForkManager; use Benchmark; use File::Basename; my $thisScript = basename($0); my ($infile,$scriptName); sub usage { print <<EOF; USAGE: $thisScript -infile yourFile -script scriptToInvoke EOF } my %optctl=(); usage if (!GetOptions(\%optctl, "infile=s", "script=s", "<>", \&usage) ); if (exists($optctl{"infile"}) && exists($optctl{"script"})) { $infile = $optctl{"infile"}; $script = $optctl{"script"}; } else { exit usage; } # Use for Expect debugging open INPUT,"<$infile" or die "Could not open input ($!)\n"; my @input=<INPUT>; close INPUT; my $start = new Benchmark; my ($pm,$masterList); my $maxProbes = 80; # set this to one to do one at a time $pm = new Parallel::ForkManager($maxProbes); foreach my $line (@input) { chomp $line; $pm->start and next; my $stdout = `$script \"$line\"`; chomp $stdout; print "$stdout\n"; $pm->finish; } $pm->wait_all_children; my $end = new Benchmark; my $diff = timediff($end,$start); print "Time taken was ", timestr($diff, 'all'), " seconds\n";
Re: running multiple system commands in parallel
by markkawika (Monk) on Aug 28, 2009 at 20:50 UTC
Re: running multiple system commands in parallel
by bichonfrise74 (Vicar) on Aug 28, 2009 at 19:22 UTC

      It's a great module, but it's useless here since the OP wants to capture the output of the children. For that, he'd need to use select or threads as part of the solution.

      He might want to take a look at Fork::Manager's code to see how to limit the number of concurrent children, but he won't be able to use the module itself.

Re: running multiple system commands in parallel
by Sewi (Friar) on Aug 29, 2009 at 08:39 UTC
    Hmm, why don't make it simple, at least if you're on Linux?
    # We have some jobs which take some time... @Cmds = ('curl "http://www.perlmonks.org/?node=Newest%20Nodes"', 'fsck -n /', './calculate_high_accurate_pi'); # Let's start them... my @PIDs; my $Tempfile = '/tmp/job_controller.'.$$; for (0..$#Cmds) { $PIDs[$_] = `$Cmds[$_] >$Tempfile.$_ 2>&1 & echo $!`; } # Ok, they all are running now while (kill 0,@PIDs) { sleep 1; } for (0..$#Cmds) { open my $TmpFH,$Tempfile.'.'.$_; print "----- $Cmds[$_] returned: -----\n". join('',<$TmpFH>); close $TmpFH; unlink $Tempfile.'.'.$_; } print "Everthing done!\n";
    I defined three jobs: One waits for the network, one for the disk and one for the CPU.
    We need two helpers: One array for storing the PIDs of the running jobs and one prefix for temprary files we're going to create.
    Each job is started with the help of bash: The & brings the main job into background and now $! contains the new PID of the job. (If your jobs finish their output with some kind of end_of_file marker (like </html>), you could check for this later, but beware of crashed jobs - you'll wait forever!) Each job writes his STDOUT and STDERR into a combined file which is called like our defined prefix and the job number.
    Now they're running, your script could do something useful here, but we're just going to wait for them. kill sends the signal 0 (which is some kind of noop, it doesn't actually reach the process) to all our processes and returns the count of processes which exist. It would loop as long as there is at least one of our childs running.
    As soon as we know that the job is finished, we could just read the output and do whatever we like to to.

    Improvements/Comments

  • Look at the time your jobs need to finish. If they take approx 5 seconds each, don't sleep for a second,
    select undef,undef,undef,.5;
    would do the job. If they run for hours, you may want to sleep 30 because half a minute on top doesn't care.
  • If you work with the output or start follow-up jobs as soon as the output from the first job is complete, kill 0,$PID them one-by-one and as soon as one has finished, start the post-processing. The results from the others won't be lost - you'ld find them as soon as you're done with the first processing.
  • You could even catch the outputs while they're running (don't forget to add $|=1 if your jobs are perl-scripts). Think of a webpage with 3 progress bars. Every one or two seconds, you read the last lines/bytes from the output file and update the process bars for each job.
  • As you may have noticed, I usually prefer the simple way instead of throwing a dozen modules against a simple problem.

Re: running multiple system commands in parallel
by casiano (Pilgrim) on Aug 29, 2009 at 07:42 UTC
    If you have a multiprocessor system (several processors/cores sharing the same memory/address space) you can use the solution suggested in the other posts.
    If you have SSH access to other machines, then you can evenly distribute the execution of the backticks among the machines using the parallel backtick qx method provided by GRID::Cluster.