Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Taking advantage of multi-processor architecture

by bedanta (Novice)
on Feb 11, 2005 at 13:58 UTC ( #430095=perlquestion: print w/replies, xml ) Need Help??

bedanta has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a PERL script that reads two files and performs some operation on them, this code returns in about 35 min. in a server with 1 processor as well as a server with multiple processors.

How can I make the PERL script take advantage of the multiple processors? Plz help...

Regards,
Bedanta
  • Comment on Taking advantage of multi-processor architecture

Replies are listed 'Best First'.
Re: Taking advantage of multi-processor architecture
by dragonchild (Archbishop) on Feb 11, 2005 at 14:19 UTC
    To expand on what others have said, you need to take a look at what your program is doing. The important thing to figure out is if there are actions the script does that can be done without knowing if other actions have been completed. In other words, are there actions that can be completely independently of each other? Can you divide-and-conquer the problem a-la the binary search algorithm?

    A few rules of thumb:

    • If you're modifying files in place, this is generally not a good candidate for SMP*. This is because you might overwrite changes you made in processA with changes made in processB. However, if you can split the files you're modifying in place, modify them separately, then recombine them, then you have a good candidate for SMP.
    • If you're loading a datafile into a database, this is generally a good candidate for SMP. Just split the file into N pieces and launch N copies of your loading program where N is generally twice the number of processors you have. So, if you have 4 processors, split the datafile into 8 pieces and launch a copy of your loading program for each piece.
    • If you're building a report from some data, this is generally not a good candidate for SMP by a human. (Databases will handle SMP for you.)

    *: SMP = Symmetric Multi Processing - using more than one processor at one time.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I realize that the OP's question undoubtably deals with SMP, but I've made it a mission to point out that TIMTOWTDI when it comes to multiple processor machines.

      <pedantic>
      SMP = Symmetric Multi Processing - a certain way of using more than one processor at a time, usually reserved to systems of eight or fewer processors. Compare NUMA.

      NUMA = Non-Uniform Memory Access - a type of multi-processor arrangement in which memory access times differ depending upon which processor is using which part of memory, necessitating that the OS keep track of which parts of memory it uses or assigns to applications. Compare SMP, see The Linux Scalability Effort for some examples.

      Update (for sake of a more complete list): There's also Assymmetric Multi Processing, in which certain code must be run on certain processors instead of any process being assigned to any processor. NUMA is generally closer to SMP than AMP.

      Some definitions of SMP only address its differences from AMP, saying that SMP means that any process can run on any processor. This means that most NUMA machines are SMP with additional scheduling concerns.

      As I've heard SMP defined, it is usually clarified that not only can any processor handle any process, but that the machine also allows each processor the same access to all of the system's main memory. This definition makes SMP and NUMA distinct.
      </pedantic>



      Christopher E. Stith
Re: Taking advantage of multi-processor architecture
by inman (Curate) on Feb 11, 2005 at 14:10 UTC
    Perl will run the script in a single thread of execution. In order to take advantage of multiple processors, you need to re architect your app and either use threads or multiple processes. Please note that thread / process allocation is the responsibility of the OS. The technique merly gives the OS the ability to assign the thread / process to different processors. On some systems, you will be able to assign CPU affinity for a process.
Re: Taking advantage of multi-processor architecture
by hardburn (Abbot) on Feb 11, 2005 at 14:10 UTC

    Taking advantage of conncurrency ranges from highly trvial up to Halting-Problem difficult. It all depends on what your goals are.

    You'll need to take a look at what your code does to the data. Lets say you've got a big loop that processes all the data. Does that loop's next iteration depend on the last one completeing? If not, you can split the data in half, then fork() the processes, giving each process half the data. Then get the data back together.

    If this loop does depend on the last iteration completeing, your task may be difficult or impossible.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      Actually depending on how IO intensive the work is you may find that it's advantageous to split into more processes (or threads) than you have physical processors. If one of the workers becomes blocked waiting on IO the OS will be able to schedule another which can get useful work done while the first one's waiting for its data.

      One application of this trick is if you're using GNU make to tell it to run n+1 jobs on an n processor box (e.g. make -j 3 on a dual CPU machine). While compiles are usually CPU bound, there's usually enough IO slack that it'll finish a little faster than if you'd just run -j 2.

Re: Taking advantage of multi-processor architecture
by neilwatson (Priest) on Feb 11, 2005 at 14:19 UTC
      While a good way of managing the actual forking, this doesn't address how to restructure the program so that forking can be used. That, I think, is the crux of the OP's question. Or, if I may rephrase,

      "Why doesn't my Perl script transparently take advantage of multiple processors for a speedup without me having to do anything? It transparently takes advantage of so many other things ..."

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Taking advantage of multi-processor architecture
by BrowserUk (Pope) on Feb 11, 2005 at 14:29 UTC

    Not enough information.

    How big are the files?

    What operations is it performing on them?

    Depending upon what the script is doing, you may be able to cut the processing time in half on a dual processor machine, or it might not make any difference at all.

    If you post the code, or a working program that does the same basic steps as your real code, you would stand some chance of getting meaningful advice.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Taking advantage of multi-processor architecture
by bluto (Curate) on Feb 11, 2005 at 18:34 UTC
    Since you've only posted a general question, we can only give you general advise. In order for you to optimize a program, you must know two things. What resource is it limited by (usually CPU speed or poor code design; real memory; disk speed)? How can you restructure your code to either limit it's dependence on that resource or parallelize access to it. Others have mentioned forking/threading, but these can be hard to use if you inexperienced with them and don't have the time to learn. Some other things you may want to consider...

    If your script is performing a lot of IO (reading and writing), consider separating the files onto different physical disks, and importantly don't access files through things like NFS mounts. Sometimes this alone can double the throughput, esp if you are reading and writing two files at the same time on the same physical disk.

    You really do not want the system to be swapping while your program is running, since it will cause things to slow down a lot. This is often caused by trying to manipulate massively large data structures in memory in perl. If your script is using lots of memory (e.g. reading two large files completely into memory before processing), consider processing as you read each line in. If you need them in arrays, consider using something like Tie::File. One common example is trying to sort a massive array within perl itself. Sometimes you can call an external utility to do this for you much more quickly (e.g. GNU sort).

    If you give more details, I'm sure someone can help out more.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://430095]
Approved by inman
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2021-12-01 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (15 votes). Check out past polls.

    Notices?