Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Running Parallel Perl Script on Linux Cluster

by monkey_boy (Curate)
on Mar 31, 2006 at 14:06 UTC ( #540471=note: print w/ replies, xml ) Need Help??


in reply to Running Parallel Perl Script on Linux Cluster

In mycode.pl, put an END block that creates a file to show the job has been completed, have your "master" script submit the jobs, then periodically check if *all* the files are available before concatinating.

# in mycode.pl END { system("touch done$param"); }; # in master.pl SLEEPY: while (1) { sleep(5); for my $done (@done_list) { next SLEEPY if ! -e $done; }; last; }; # cat files now: etc..



This is not a Signature...


Comment on Re: Running Parallel Perl Script on Linux Cluster
Download Code
Re^2: Running Parallel Perl Script on Linux Cluster
by sh1tn (Priest) on Mar 31, 2006 at 18:34 UTC
    Don't you mean:
    END { open my $fh, ">", "done$param" or die $! }
    instead of END { system("touch done$param"); }; ?


Re^2: Running Parallel Perl Script on Linux Cluster
by monkfan (Curate) on Apr 01, 2006 at 10:30 UTC
    Hi monkey_boy,

    Thanks a lot for your answer.
    In addition to shltn's question below, which you haven't answered. You mean creating "done$param$ as a dummy file? Note sure why you would "touch" the file..
    I have other following questions:
    • Why do we need END{}, instead of plain -unblocked- system call?
    • It is unclear to me where do you get @done_list. I assume obtaining them by glob? Like these
      # in master.pl my @done_list = glob("done*"); SLEEPY: while (1) { sleep(5); for my $done (@done_list) { next SLEEPY if ! -e $done; }; last; }; # cat files now:
    • In my case the although the file is created may not indicate that it is completed. Thus the file existence checking with -e may not reflect the completion.
      Does the END block you have guarantees the checking that the file is 100% completed? Please correct me if I'm wrong here.


    Regards,
    Edward
      Hi, this is monkey_boy (not logged in, as im at home),
      • the "touch" is to create an empty dummy file, seperate from your results file.
        The reasoning is that your results files will be created at the start of the processing, so your master script cannot check for the existance as proof of completion.
      • The END block always get executed at just before a script terminates, so its as good a place as any to "touch" the file.
      • @done_list is left for you to code, its simple perl, you have a list of jobs somewhere, convert them with a regex into a list of "done" files.
      • In reply to shltn's question, his way is probably better, as you'll hopefully get an error on failure (but given my experience with sun grid engine this is not always the case ;))

      Hope this is helpfull. monkey_boy
        @done_list is left for you to code, its simple perl, you have a list of jobs somewhere, convert them with a regex into a list of "done" files.
        Hi monkey_boy,

        Just to double confirm, @done_list will contain list of dummy files created in END block right? Or is it a list of actual output files? Sorry I'm a bit slow here....

        Regards,
        Edward
Re^2: Running Parallel Perl Script on Linux Cluster
by salva (Monsignor) on Apr 01, 2006 at 18:14 UTC
    a similar result can be attained changing the shell wrapper instead of the perl script that runs on the cluster nodes:
    #!/usr/bin/bash cd ~/some_dir (perl mycode.pl $1 $2 > ~/some_out_dir/param_set$1.out.tmp && mv ~/some_out_dir/param_set$1.out) || touch ~/some_out_dir/param_set$1.fail
    then, on the master you will have to poll from time to time to see if all the result files exist or if "fail" files are there, to requeue the jobs.

    Though, it should be better ways to synchronize partial jobs over the cluster.

      ...you will have to poll from time to time to see if all the result files exist or if "fail" files are there, to requeue the jobs.
      Hi salva,

      Thanks for the answer. Roughly, I suppose that one would use glob to check the file (fail or not)? But not sure how to 'reqeue' the jobs.

      Can you give a simple example how would one do the above step as you suggested?

      Regards,
      Edward
        the best way to check if a file exists is with the -f operator (it is documented in perlfunc).

        To requeue a job, you would need to run the corresponding qsub command again, ensure that you delete the "fail" file first with unlink.

        Hi monkfan, Can you share how the code worked? I'd like to try out a variation on a 32 proc cluster in a classroom environment. Thanks! vanallp

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://540471]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2014-09-17 01:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (56 votes), past polls