Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Perl and autoparallelization

by qhayaal (Beadle)
on Jun 06, 2004 at 09:29 UTC ( [id://361750]=perlquestion: print w/replies, xml ) Need Help??

qhayaal has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
If I have a program that can easily be handled by multiple processors (say by hyperthreading), is there a way of directing Perl to do so?

For instance, I have to crunch thousands of files, and generate output for each in a seperate file. How can I make Perl use a particular number of processors to do so?

Thanks for any suggsetion.
Happy Sunday.

Replies are listed 'Best First'.
Re: Perl and autoparallelization
by Zaxo (Archbishop) on Jun 06, 2004 at 09:55 UTC

    I think that any answers will be very OS dependent. Unless the OS provides user-space library functions for processor scheduling, I doubt if perl can provide direct control over processor usage.

    Your best bet, IMO, is to use Parallel::ForkManager to fork off a few child processes at a time, each crunching a file. You will get extra time slices at least, on many OS's, and they may get executed on other processors.

    If I understand correctly, in Linux SMP the kernel tries to avoid reallocating memory between processors by setting up affinity of a process for the processor that initially got it. Linux's copy-on-write strategy for the child process's environment may stick the children with the parent's affine processor. Ob. warning, my understanding of SMP kernels was never perfect, and is a little out of date.

    Your desired optimization may be premature. Most file crunching processes spend most of their time waiting on disk I/O.

    After Compline,
    Zaxo

Re: Perl and autoparallelization
by BrowserUk (Patriarch) on Jun 06, 2004 at 18:10 UTC

    Hyperthreading (assuming your talking about Intel's Hyperthreading technology) 'just happens' (or not). If the code in the program is condusive to being hyperthreaded it will be, otherwise it won't. You do not control it.

    You might derive some extra benefit from hyperthreading if you compiled Perl using Intel's C compiler (which is huge and very expensive), as they may well have added optimisations to their compiler that will make the compiled code more condusive to hyperthreading, but the differences are likely to be small due to the 'data is code' nature of Perl (and other interpreters).

    As Zaxo pointed out, most file crunching programs are IO-bound not CPU-bound, so multi-tasking them is often of little benefit. If your task is IO-bound, then you are better of buying a faster disk, or perhaps splitting your files across multiple (real, physical not virtual) disks.

    In the rare event that your processing is CPU-intensive. Eg. Each file is small but requires a large amount of processing--some gene work might fit this category. Then, you probably could benefit from multi-tasking the overall load across several processors. If each read-process-write cyle is entirely independant of each other, then simply splitting the input files into one group per processor (eg. [a-f]*, [g-m]*, [n-t]*, [u-z]* for 4-processors) and starting one copy of the program to handle each group is probably as simple as it gets and reasonably effective.

    If your task requires each processing cycle to have some knowledge of other processing cycles, then running one thread per processor may be easier.

    As you can see, determining what if any benefit can be derived from multi-tasking a process, and how best to achieve it, requires fairly detailed knowledge of both the processing required and the system on which it is going to run.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      Splitting the files is an attractive and simple solution provided the CPU requirements for processing each file are comparable or there are sufficient of them for the differences to average out. Suppose that assumption isn't true? How would you use Perl to manage a jobqueue which sent the next file for processing to the next available processor?

        I'd use one worker thread per processor and a Thread::Queue of the files to be processed. The main thread sets up (or feeds, if the list is very large eg. >~10,000) the Q with the files to be processed.

        The threads take the first file off the Q, process it and then loop back and get the next until the Q is empty.

        This is extremely simple to code and since 5.8.3 appears to be very stable as far as memory consumption is concerned, though I haven't run any really long runs using Thread::Queue.

        Once the threads are spawned, no new threads or processes need to to be created or destroyed which make it pretty efficient. All the sharing and locking required is taken care of by the tested and proven Thread::Queue module.

        I might try varying the number of threads up and down to see what gave the optimal throughput.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Another choice might be POE. I haven't used it, but it seems like it would be a good choice for this kind of load-balancing thing, especially if there's a chance that it will later exceed the capacity of one machine.

        --
        Spring: Forces, Coiled Again!
Re: Perl and autoparallelization
by ambrus (Abbot) on Jun 06, 2004 at 18:35 UTC

    Forking multiple processes can be a good idea, especially if you don't have to transmit data fast between the processes. To help you, perl has open $FH, "-|"; to fork the process and link with pipes, or open $FH, "-|", $scriptname; to popen (fork-exec and have a pipe open to) an other program.

Re: Perl and autoparallelization
by qhayaal (Beadle) on Jun 07, 2004 at 17:58 UTC
    Thanks Zaxo, BrowserUk, ambrus and others for the comments and suggestions. I checked on the Intel (P4) based HT with the default (Redhat 9) Perl, and I found that Perl makes use of both the processors. The following is the test program I have used to test this..., so in a way my problem is virtually solved.

    Thanks again.
    -Qhayaal

    #!/usr/bin/perl -w use threads; $thr1 = threads->new(\&subr, 1,10000000 ); $thr2 = threads->new(\&subr, 1,20000000 ); $_ -> join for ($thr1, $thr2); print "Threads returned.\n"; exit; sub subr { ($min,$max)= @_; print "In the thread\n"; for ($i=$min;$i<=$max;$i++) { 1; } return 1; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://361750]
Approved by Corion
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-03-28 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found