Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

WorkCrew threading

by FFRANK (Beadle)
on Jul 24, 2007 at 15:46 UTC ( #628506=perlquestion: print w/ replies, xml ) Need Help??
FFRANK has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

Perlthrtut - "In the work crew model, several threads are created that do essentially the same thing to different pieces of data."

Looking for a dummy example of workCrew threading, to be built on.

Something like: 1) take an array of numbers, 2) split it equally between 10 threads (each thread would process 1/10 of the array), 3) do something with those (some sub), 4) return the results into an array (initial positions), 5) return the indice of the initial number that has generated the highest result.

#!/usr/bin/perl -w use strict; use threads; use PDL; my @array = qw (9 10 3 2 4 7 8 1 5 6); # Separate the data in threads, do the sub (e.g add one), # return values into array at their original position (indice) my @array = qw (resThread1 resThread2 ...) my $piddle = pdl (@array); my ($min, $max, $min_ind, $max_ind) = minmaximum($piddle); print $max_ind,"\n";
Would the time gain be ~ 10 by using this approach on a single processor (say @array is big)...

Thanks very much,


Comment on WorkCrew threading
Download Code
Re: WorkCrew threading
by BrowserUk (Pope) on Jul 24, 2007 at 16:18 UTC
    Would the time gain be ~ 10 by using this approach on a single processor ...

    No. There would be no gain at all. In fact, there would be a penalty.

    1 thread x 1 work == 1 work.

    10 threads x ( 1/10th work + context switches ) = 1 work + 10xcontext switches.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: WorkCrew threading
by Fletch (Chancellor) on Jul 24, 2007 at 16:50 UTC

    If you're going to gain performance from threading (or forking for that matter) on a single CPU it's going to be by separating stuff like IO bound actions (reading or writing to disk) from the processing work. However if you design for it you can have code that probably should scale better when more processors are available (i.e. you're taking a threading overhead hit on single processor boxen in order to get better performance by conditionally spawning more workers on boxen with more than one).

Re: WorkCrew threading
by mr_mischief (Monsignor) on Jul 24, 2007 at 17:05 UTC
    What BrowserUk said, plus the fact that the programmer, you, has to deal with coordinating ten threads instead of just getting the work done in one.

    On the bright side, if you develop it on one processor then move it to ten, you should see a nice speedup assuming the job is high enough priority that it gets time slices on all ten processors, your coordination between threads as programmed by you is lightweight enough, and the contexts switches don't eat up all the differences.

    On the dark side, something so simple, if it's not doing this with a huge, huge array, would probably just be scheduled all on one or two processors on most SMP machines anyway because other jobs would be given higher priority (unless you have a ten processor box just lying around for testing).

    Really, you should see some speedup, in theory, well before ten processors. The overhead inherent in the thread coordination plus the overhead of the context switches in the system mean you may not necessarily see any improvement on a two-processor box, though. I think I could safely say it should be faster to do it as ten threads than as one by the time you get to four processors, but the ten threads wouldn't necessarily be any faster there than four threads.

    In other words, welcome to the great unknown of multi-threaded programming on multi-processor machines. There are truths to be told, such as what BrowserUk told you about overhead taking away performance on the single-processor box. Beyond that, though, there's theory and practice.

    Take a look at any of the recent dual-core and quad-core CPU benchmarks out on the hardware enthusiast websites with the more recent games. You'll see that the chip companies don't even pretend that twice the cores at the same speed is going to equal twice the work. Then the reviewers and benchmarkers go on to say which benchmarks make more or less use of the multiple cores.

    I don't mean to scare you away from threads, but they are no panacea. They don't, for example, deal well with being split across machines. Multi-process projects that use IP networking do. Threads don't necessarily mean better speed on a single computer, but they can. The best reasons to use threads are that they fit the design of the project well and that they can take advantage of particular architectures when run on those architectures. Of course, using them just to learn to use them better is a good goal, too.
Re: WorkCrew threading
by dsheroh (Parson) on Jul 24, 2007 at 17:07 UTC
    make tends to run fastest when set to use twice as many processes as there are CPUs, so there are cases where more threads/processes than CPUs can provide a performance benefit, generally because one can be working on its available data while another waits for more data to arrive (from disk, network, main memory, wherever). But you won't even get twice the performance out of that unless you're waiting on a very slow data source.

    In this context, forking/threading just provides a way to minimize the CPU's idle time. If your CPU utilization is already at or near 100% (as it probably would be for a simple processor-bound task like this one), then you're not going to gain anything. Threads aren't magic enough to run the CPU at 110%, much less 1000%.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://628506]
Approved by lima1
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (8)
As of 2015-04-01 23:07 GMT
Find Nodes?
    Voting Booth?

    Who makes your decisions?

    Results (52 votes), past polls