Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

The State of Parallel Computing in perl 2007?

by jettero (Monsignor)
on Jan 21, 2007 at 15:20 UTC ( #595771=perlquestion: print w/ replies, xml ) Need Help??
jettero has asked for the wisdom of the Perl Monks concerning the following question:

What are people using for parallel packages these days?

I've been daydreaming for some time about a minature OS that runs in the systray in win32 and as a daemon in linux... You know, that runs perl and has some kind of shared filesystem and/or shared memory.

It occurred to me recently that it might already exist. I looked at Parallel::Pvm a little and it seems to go the right direction perhaps. jcwren talks about parallelism a little here: Parallel Processing, Processes, and Threads. I found one article here that mentioned Parallel::Pvm and some other system that seemed abandoned/forgotten (but it wasn't perl based). I can't find that article now...

I found some offsite things (eg, parawiki and Parallel_computing), but I was looking for perl specific things — all the nodes here that talk about it seem to be several years old — or at least, I don't know how to look for the newer ones.

At one point, I had hoped POE would help — and for all I know, it does; but it seemed woefully single threaded to me.

Again, what are people using for parallel packages these days? I have a sneaking suspicion it hasn't changed all that much since the older posts I've found. That, or everyone just uses threads and/or message passing by hand maybe?

UPDATES and CLARIFICATIONS:

  1. I intentionally didn't say what I meant by parallel because I'm interested in any links people have. Personally, I'm mostly interested multi-computer scenarios, but multi-processor scenarios would be interesting to read about also.

-Paul

Comment on The State of Parallel Computing in perl 2007?
Re: The State of Parallel Computing in perl 2007?
by zentara (Archbishop) on Jan 21, 2007 at 17:04 UTC
    Are you talking about multiple processors working on a single problem, or just processes running in parallel, sharing data somehow? It's a big topic, and you need to narrow down what Parallel means.

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: The State of Parallel Computing in perl 2007?
by diotalevi (Canon) on Jan 21, 2007 at 18:28 UTC

      If your interest in Mozart/Oz is due to it's distributed and parallel computing aspects, you might also find Erlang interesting if you haven't already encountered it.

      I find the Erlang cui repl preferable to the Oz emacs-based interface, but if you like emacs that will be less of a consideration.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Tying the language to an editor has almost completely ruled it out for me. What an odd thing to do. It does still sound pretty interesting though. I'm installing erlang presently. I've heard people mention it before.

        I feel like I'm taking away a "perl doesn't really have much of this yet" feeling from the two posts above this though. Is that the case?

        -Paul

        No, my interest in Mozart comes from its integrated constraint solvers. The distributed computation stuff is just gravy.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        An attractive talk was just posted to Lambda the Ultimate about Erlang and concurrency: LCA2007: Concurrency and Erlang.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: The State of Parallel Computing in perl 2007?
by diotalevi (Canon) on Jan 21, 2007 at 23:33 UTC
Re: The State of Parallel Computing in perl 2007?
by toma (Vicar) on Jan 22, 2007 at 07:42 UTC
    I don't know if you would consider it parallel programming or not, but I use memcached to get more than one computer into the act. There are several perl modules that use it.

    Unlike many things in parallel programming, memcached is easy.

    POE::Wheel::Run will allow you to use multiple processes, which should provide parallelism on a multiprocessor machine. To use it in windows, I used Cygwin since POE::Wheel::Run required the more full-featured fork/exec.

    Another easy way to do parallel computing is to use a web server. One program can make requests to multiple servers, or a single server with multiple CPUs, and get them all working on parts of a problem. You can use POE to control the flow of making multiple web requests and synching back up when they return.

    It should work perfectly the first time! - toma

      I saw in the POE docs that there were references to needing a serializer. I didn't get far enough with it to see that you can fork. Does it support load balancing and things? I've got the POE::WHeel::Run docs up presently, and I'm seeing that it forks a child process, but I thought it was for things like spawning 'cat' or 'ls' or whatever...

      That probably isn't what I have in mind, but I do still wonder if POE has some kind of built in shared memory multi-processor and/or multi-computer features. It seems like it should.

      -Paul

        I use POE::Wheel::Run to spawn four Perl programs.
        1. A web server built from HTTP::Daemon. This provides a browser-based GUI. This web server also spawns programs that can get content from other web servers.
        2. A live link to a large CAD program.
        3. A live link to a large circuit simulator.
        4. A terminal that provides user messages and a command line, for development and for cases where the GUI doesn't have deep enough functionality.
        I hadn't thought about this program as parallel processing until I saw your question. I have only recently begun running the application on multi-cpu machines.

        The program uses message passing through several mechanisms:

        • STDIN, STDOUT, and STDERR of child processes.
        • Dropping files. The CAD package uses this for input.
        • Web calls. I recently switched from LWP to curl because of deployment difficulties. I had trouble getting my installer to automate the configuration of LWP.
        • Environment variables are used to send parameters into child programs. I had trouble with platform differences in the handling of command-line arguments. This is possibly due to differences in quoting and escaping.
        It should work perfectly the first time! - toma
      You use memcached as a database? Not a good idea. Use a database for that. Memcached doesn't consider it a problem to drop your data silently if it runs out of free RAM. That's typical for a cache.
        No, I don't think I mentioned using memcached as a database. Memcached allows me to use multiple machines to cache data in a transparent manner. It allows me to use more, cheaper machines rather than one big expensive machine. I have heard people describe this tactic as "build out, not up."

        A common use case is to cache the results of an SQL query. I use the query as the cache key and the value is the result set from the query. I check the cache to see if the dataset is there. If it is, I get it from the cache. If it isn't, I run the SQL and put the results in the cache. This provides me with a huge speedup.

        If the data in the database gets updated, I flush the cache and start again. This is not a problem for parts of my application, so those are the parts where I use memcached. Instead of storing lot of data in a perl data structure in mod_perl program and counting on the copy-on-write mechanism to save RAM, I use memcached.

        It should work perfectly the first time! - toma

        memcached is just about the coolest thing ever btw. It's on the top of my list of things to learn next. UPDATE: oops, wrong parent. Eh.

        -Paul

Re: The State of Parallel Computing in perl 2007?
by markatwork (Initiate) on Jan 22, 2007 at 12:07 UTC
    From the realms of "I saw this once and I thought it looked interesting", rather than being anything I've actually used,
    WSRF::Lite at
    http://www.sve.man.ac.uk/Research/AtoZ/ILCT
    looks interesting.

    Googling for 'perl grid' brings back a few links that seem to concern the areas you're looking at.

    Regards Mark
Re: The State of Parallel Computing in perl 2007?
by moklevat (Priest) on Jan 22, 2007 at 15:35 UTC
    Hi jettero,

    I occasioanlly work on "trivially" or "embarassingly" parallelizable problems. For me this most often involves computing a single statistic from a dataset over a large combination of parameters. The most efficient solution for me has been to use R with the Rmpi package to interface with MPI. Depending on the scope of the task, I may also use MySQL for distributing the dataset and collecting the results using RMySQL.

    I could see doing the same thing in perl with PDL, Parallel::MPI, and your favorite database.

      I clicked through to ::MPI a little, but the low version number and update from 1999 kinda scare me off. I have looked at PDL enough to wish I had columns of numbers to process.

      -Paul

        I initially had to choose between PVM and MPI, and I ended up using MPI only because that was the first thing I tried and it happened to work for me. From what I had read at the time, PVM should work just as well as MPI for trivially parallelizable tasks. I would not guess that the MPI module is so trivial that it did not warrant any changes, but it does look like the PVM module has seen more development activity.
Re: The State of Parallel Computing in perl 2007?
by erix (Vicar) on Feb 18, 2007 at 17:11 UTC

    As a multi-computer scenario, Condor might be interesting for you.

    Condor lets you submit a program/batchfile/shellscript to a queue of many machines (nodes). Every one of these nodes needs to have a condor client installed. The condor client advertises the resources that that particular machine has on offer. This information is then used to match your job requirements to any number of machines. Advertised attributes are things like: CPU-type, OS-type, Amount of memory, free disk space, etc. If enough clients are available, your jobs will run simultaneously.

    Condor can use dedicated machines, or take advantage of idle clients: running only on designated times (at night, for instance) or monitoring machine activity, and kicking in after some idle period.

    Obviously, because clients need to be installed on all machines, it needs some organisation (=politics) to get authorization to run your programs on a sizable group of machines.

Re: The State of Parallel Computing in perl 2007?
by casiano (Pilgrim) on May 22, 2008 at 12:35 UTC
    If you have several UNIX platforms with Perl installed and SSH access, then you can use GRID::Machine to have Perl interpreters running in those nodes and make them collaborate. The best thing being that you don't have to ask administrators to install any additional software.

    I have written a tutorial (GRID::Machine::perlparintro) that through a simple example introduces how to use Perl via GRID::Machine to exploit the computing power of idle workstations.

    Hope it Helps

    Casiano

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://595771]
Approved by Joost
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (10)
As of 2014-12-22 15:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (119 votes), past polls