Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Running Parallel Perl Script on Linux Cluster

by monkfan (Curate)
on Mar 31, 2006 at 12:50 UTC ( [id://540445]=perlquestion: print w/replies, xml ) Need Help??

monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Dear all,

I have a Perl code that take some parameters. I would then run this code on few parameter sets on a Linux Cluster - comprising 40 CPUs. Finally, I would con-cat-enate the result into one single file.

Here is the general steps I would do:
1. Perl code is executed in this way individually.
$ perl mycode.pl file1.txt param1 param2 > param_set1.out $ perl mycode.pl file1.txt param3 param4 > param_set3.out $ perl mycode.pl file1.txt param4 param5 > param_set4.out $ cat param_set1.out param_set3.out param_set4.out > final.out
Here we have param_set1 = (param1,param2), etc. There are many of these param sets. For each of this param set I would run on Linux Cluster via qsub command.

2. Before running qsub command I have this bash script required by qsub to do his job. Let's call this script "runcode.sh".
#!/usr/bin/bash cd ~/some_dir perl mycode.pl $1 $2 > ~/some_out_dir/param_set$1.out
3. Subsequently I would manually submit each param set to the cluster as in shown in step 1 as follows:
$ qsub runcode.sh param1 param2 $ qsub runcode.sh param3 param4 $ qsub runcode.sh param5 param6
My question is how can I write a Perl script that:
  • Automatically submit all these parameter sets into Linux Cluster.
  • Automatically concatenate each of the results into one single final file.
    Noting that all outputs of each param set must be completed first before concatenating them.

Regards,
Edward

Replies are listed 'Best First'.
Re: Running Parallel Perl Script on Linux Cluster
by monkey_boy (Priest) on Mar 31, 2006 at 14:06 UTC
    In mycode.pl, put an END block that creates a file to show the job has been completed, have your "master" script submit the jobs, then periodically check if *all* the files are available before concatinating.
    # in mycode.pl END { system("touch done$param"); }; # in master.pl SLEEPY: while (1) { sleep(5); for my $done (@done_list) { next SLEEPY if ! -e $done; }; last; }; # cat files now: etc..



    This is not a Signature...
      Don't you mean:
      END { open my $fh, ">", "done$param" or die $! }
      instead of END { system("touch done$param"); }; ?


      Hi monkey_boy,

      Thanks a lot for your answer.
      In addition to shltn's question below, which you haven't answered. You mean creating "done$param$ as a dummy file? Note sure why you would "touch" the file..
      I have other following questions:
      • Why do we need END{}, instead of plain -unblocked- system call?
      • It is unclear to me where do you get @done_list. I assume obtaining them by glob? Like these
        # in master.pl my @done_list = glob("done*"); SLEEPY: while (1) { sleep(5); for my $done (@done_list) { next SLEEPY if ! -e $done; }; last; }; # cat files now:
      • In my case the although the file is created may not indicate that it is completed. Thus the file existence checking with -e may not reflect the completion.
        Does the END block you have guarantees the checking that the file is 100% completed? Please correct me if I'm wrong here.

      Regards,
      Edward
        Hi, this is monkey_boy (not logged in, as im at home),
        • the "touch" is to create an empty dummy file, seperate from your results file.
          The reasoning is that your results files will be created at the start of the processing, so your master script cannot check for the existance as proof of completion.
        • The END block always get executed at just before a script terminates, so its as good a place as any to "touch" the file.
        • @done_list is left for you to code, its simple perl, you have a list of jobs somewhere, convert them with a regex into a list of "done" files.
        • In reply to shltn's question, his way is probably better, as you'll hopefully get an error on failure (but given my experience with sun grid engine this is not always the case ;))

        Hope this is helpfull. monkey_boy
      a similar result can be attained changing the shell wrapper instead of the perl script that runs on the cluster nodes:
      #!/usr/bin/bash cd ~/some_dir (perl mycode.pl $1 $2 > ~/some_out_dir/param_set$1.out.tmp && mv ~/some_out_dir/param_set$1.out) || touch ~/some_out_dir/param_set$1.fail
      then, on the master you will have to poll from time to time to see if all the result files exist or if "fail" files are there, to requeue the jobs.

      Though, it should be better ways to synchronize partial jobs over the cluster.

        ...you will have to poll from time to time to see if all the result files exist or if "fail" files are there, to requeue the jobs.
        Hi salva,

        Thanks for the answer. Roughly, I suppose that one would use glob to check the file (fail or not)? But not sure how to 'reqeue' the jobs.

        Can you give a simple example how would one do the above step as you suggested?

        Regards,
        Edward
Re: Running Parallel Perl Script on Linux Cluster
by aquarium (Curate) on Mar 31, 2006 at 13:34 UTC
    off the top of my head (untested)
    foreach ($first,$second)(param1, param2, param3, param4) { system("qsub runcode.sh $first $second"); }
    then...unless your mycode.pl somehow returns control before finishing...test for $2 being the last parameter value (in runcode.sh just after calling mycode.pl) and concat.
    the hardest line to type correctly is: stty erase ^H
Re: Running Parallel Perl Script on Linux Cluster
by zentara (Archbishop) on Mar 31, 2006 at 17:47 UTC
    I wish I had clusters to experiment with. :-) Have you searched SuperSearch for "cluster"?. Maybe the all_exit_ok() method of Proc::Queue ?

    I'm not really a human, but I play one on earth. flash japh

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://540445]
Approved by Corion
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2024-04-23 09:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found