Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Grid Engine

by baxy77bax (Chaplain)
on May 17, 2009 at 16:45 UTC ( #764529=perlquestion: print w/ replies, xml ) Need Help??
baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

one simple question. did anyone started to build some kind of a grid engine for distributing jobs across nodes on the cluster. so far i was/am using sun grid engine but i'm not satisfied with its performance, provided control over my job distribution and especial the fair-share system it supports (i don't think it's fair enough)

so i was wondering if there are some perl modules that provide an alternative, or if there are any abandoned projects on the subject i could try to finish and incorporate my viewings on how the job distribution through cluster should go.

thank you

Comment on Grid Engine
Re: Grid Engine
by swampyankee (Parson) on May 17, 2009 at 18:11 UTC
    Have you looked into genericNQS. If I recall correctly, it was used when I temped at Pratt & Whitney a few years back. (by the engineering, vs corporate IT department; the latter wouldn't use any solution that didn't cost several tens of thousands of dollars)


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Re: Grid Engine
by SilasTheMonk (Chaplain) on May 17, 2009 at 20:50 UTC

    I think Parallel::PVM is the standard which I have not used. I have worked on or seen bits of grid systems but only proprietary stuff.

    I am also not clear if there is a clear distinction between "grid computing" and the current buzzword "cloud computing". I suppose the difference may be that you don't own the hardware that clouds run on. Anyway there are lots of providers of cloud computing: Amazon's S3, flexiscale etc etc.

Re: Grid Engine
by talexb (Canon) on May 18, 2009 at 02:55 UTC

    Yeah, hi, I built a module to do that, and we used it in production for about two years. Unfortunately, I couldn't get the approvals to post the module on CPAN. It was a nice module too, because it also managed to spawn jobs locally if the grid engine wasn't available.

    You could write you own, though -- mine just did some intelligent scraping of the output from the qstat command and used qsub to submit jobs and captue the job ID that was output.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Grid Engine
by gizmo_mathboy (Pilgrim) on May 18, 2009 at 03:35 UTC

    Lots of questions I have before I can really make a good comment.

    Are you talking about a job scheduler, resource manager or parallelization?

    At work I use Torque for a resource manager, Maui for the scheduler (when I get it working) and typically MPI (various forks but I'm probably gonna settle on OpenMPI).

    Are you submitting jobs to a queue?

    Are you writing the software that is the job queue?

    Are you running on a bunch of nodes/cores and what to parallelize things?

      "Are you talking about a job scheduler, resource manager or parallelization?"

      well i'm talking about the whole thing, i think it is pretty resource wasteful and jobs are poorly distributed . it eats too much of my resources, so sometimes i just log on directly to each node and start my job's manually.

      the answer to your other 3 questions are

      Yes

      Yes

      Yes

      my colleges and me are doing all those things :)

        It might not be Perl, but from what you're mentioning about wasting resources, you might want to look at Alan Sussman's work on P2PGrid ... I saw the 2006 presentation, and he was working on having the programs register what resources they need (so they'd be assigned to appropriate machines ... so you didn't waste the machines w/ 32GB of RAM on something that only needed CPU time, not memory)

Re: Grid Engine
by tsee (Curate) on May 18, 2009 at 17:23 UTC

    We're running a ~200 CPU cluster using GridEngine and are quite satisfied. The queue master simply needs a decent amount of RAM and it can currently handle around 100k queued jobs with 4GB of RAM. Our jobs are typically adjusted to run about 1-12 hours, so any queuing overhead is negligible.

    Now, I'm not aware of a full queuing system that's written in Perl and that beats Grid Engine on this scale. What I can point you at is a CLI tool for managing a Grid Engine installation. Well, not quite. It's not an admin's replacement for qmon: It's mostly useful for users who try to keep track of their jobs, put them on hold, clear their error state, add dependencies and so on.

    However, I have to admit I wrote the aforementioned tool, so I'm biased.

    In the end, the choice of tools really come down to the scale at which you're running this. If logging in to the nodes manually is still an option, then maybe Grid Engine isn't what you want. If you have very, very short jobs, it's certainly not what you want. Searching CPAN, I came up with these related modules: GRID::Machine, SSH::Batch. I've seen some others like TheSchwartz, but again, I don't know what exactly fits your usage.

    Cheers,
    Steffen

      Yes, I also administer an SGE cluster, and it works fine.

      I read the original question above a few times and there's not enough information to have a starting point.

      How large are the tasks you're scheduling?

      How many tasks per day?

      What do you mean the scheduling is not fair enough?

      I have a feeling you simply haven't read the 3 manuals available for sge, or don't know what to expect.

Re: Grid Engine
by MadraghRua (Vicar) on May 18, 2009 at 17:38 UTC
    One other point worth considering - the file sharing software. We've a 12TB NAS attached to our cluster via an Infiband switch. We're using GlusterFS as the file sharing software. It appears to be quite scalable both for adding new nodes and for adding more storage space. We're working on the principle of fast disks for short term immediately needed data, slower disks for mid term less needed but still wanted data and slower disks or tape for less needed, data, long term storage. Gluster allows us to manage this set up.

    MadraghRua
    yet another biologist hacking perl....

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://764529]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2014-07-26 05:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls