Threading and join/termination question

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day bros. I am trying to learn how to use threads in Perl to apply it in a script that is doing calculations on a large graph. I know I need to limit the number of threads I spawn, so I've been reading up and have demo code working that limits the number of threads with semaphore and works as expected.

#!/usr/bin/perl -w
use strict;
use threads;
use Thread::Semaphore;

my $sem = Thread::Semaphore->new(15); # max 15 threads
my @threads;
for my $i (0..5) {
    $sem->down;
    my $t = threads->create(\&mySubName, $i);
    push(@threads,$t);
}
foreach my $t (@threads) {
    print $t->join(),"\n";
}

sub mySubName {
    my $foo = shift(@_);
    return $foo * 10;
    # release slot:
    $sem->up;
}
[download]

However I have a question about the @threads array. Since that exists till then end of the loop, do the thread objects it contains only hold the values returned by the sub, or do they have other data that would eat a lot of memory? If so then the approach I have here won't work for processing all pairs of nodes in a 7000 node graph.

Assuming that's the case I tried this

##!/usr/bin/perl -w
use strict;
use threads;
use threads::shared;
use Thread::Semaphore;

my $sem = Thread::Semaphore->new(15); # max 15 threads
my @results :shared = 1;
for my $i (0..5) {
    $sem->down;
    my $t = threads->create(\&mySubName, $i);
}

print join(" ",@results);

sub mySubName {
    threads->detach();
    my $foo = shift(@_);
    push (@results,$foo);
    $sem->up;
}
[download]

but the output is 1 0 1 2 3 which is not right.

Guidance appreciated.

Comment on Threading and join/termination question Select or Download Code

Replies are listed 'Best First'.
Re: Threading and join/termination question by oiskuu (Hermit) on Jul 31, 2014 at 21:16 UTC
Technically, you cannot set a thread limit with semaphores this way. It does work in practice, but there's no real guarantee that a thread, or all threads, aren't preempted right after the `$sem->up()`. Second point, do not detach a thread if you need to wait upon it. Grab its end result (return value) with a `join()`, instead. You may not want to create too many threads (especially with fine-grained jobs). An alternative that might be suitable is to arrange for a conveyor belt with N workers. Use Thread::Queue for that. Here's a simple, if overly recursive, example: `use threads; use Thread::Queue; my $WQ = Thread::Queue->new('unit001' .. 'unit123'); sub qrun { map { $_ ? (worker($_), qrun()) : () } $WQ->dequeue_nb; } sub worker { warn "Work $_"; select(undef, undef, undef, rand 3); # sleep some return "$_!"; } print for map { $_->join } map { threads->new(\&qrun) } 1 .. 15;` [download]	[reply] [d/l] [select]
Re^2: Threading and join/termination question by sundialsvc4 (Abbot) on Aug 01, 2014 at 02:51 UTC
That’s a very-terse example of a million-dollar idea. (Great example if you’re a serious Perl-head ... not-so-easy if you’re not.) The idea is this: instead of defining “a thread” as corresponding to “a unit of work to be completed,” and then trying to “limit the number of” those threads, create a pool of threads, calling each one of them “workers.” Then, give those workers a production-line queue of things-to-do. Each worker grabs a request off of the production-line, does it, shoves the results downstream somewhere, and then grabs another one ad infinitum, surviving until the production-line finally runs dry. The number-of-workers (thread-pool size) should correspond to the maximum number of such units-of-work that you know this piece of hardware can predictably accomplish. So, whether the production-line queue might be short or long, you remain confident that the work is being carried-out as fast as possible ... no matter how short or long the queue might be, the “completed units-of-work per second” will remain steady. If you are gifted with faster hardware, you simply turn-up the number-of-workers knob a little bit more. (And if you are gifted with a cluster of units, spread the work out among all the CPUs you have, so-many workers apiece.)
Re^3: Threading and join/termination question by SimonPratt (Friar) on Aug 01, 2014 at 10:27 UTC
I really have to agree with sundialsvc4. Threads are so expensive to create that you are almost aways much better off to create a set number of threads (usually no more than the physical number of cores you have, for optimal results), then using a queuing system, such as Thread::Queue, to distribute work until all the work is complete. Also, unless you are creating a thread that is performing a never ending task (such as listening to events from an external source), its generally not a great idea to detach them, especially if you are interested in when they complete or what they return.	[reply]
Re^3: Threading and join/termination question by cormanaz (Deacon) on Aug 01, 2014 at 12:54 UTC
Actually I was wondering about that...are there rules of thumb for how many threads a given platform can deal with? Equal to the number of cores? Limited by memory?	[reply]
Re^4: Threading and join/termination question by SuicideJunkie (Vicar) on Aug 01, 2014 at 15:16 UTC
Re^2: Threading and join/termination question by cormanaz (Deacon) on Aug 01, 2014 at 12:52 UTC
I believe you are right because I ran my code and it seemed to run no faster than the unthreaded code. I will try to work out the queued version. Though subdialsvc4 is correct that it's a bit intimidating. As for the original question, what is in the thread object? Just the result of the sub operation, or a bunch of other data that consumes memory? Docs say a thread is a separate instance of Perl, so I'm imagining the object contains all kinds of state information.	[reply]
Re^3: Threading and join/termination question by sundialsvc4 (Abbot) on Aug 01, 2014 at 18:30 UTC
I suggest that “the queued version,” as presented, is “intimidating” merely because it is very terse. (It is, so to speak, “Perl golf.”) The essential idea is actually simple: a worker-thread survives for a long time, retrieving units-of-work from some thread-safe queue and executing them ... of course within an `eval {..}` block so that the thread will survive even if the unit-of-work does not. The number of workers, of course, determines the number of units-of-work that this system will attempt to carry out at any one time. The number of workers (just like the number of people who are scheduled for a shift at McDonald’s ...) should be “a knob” that is easy to set. Perl has an interesting implementation of threading ... you need to look at the various docs (and here in PerlMonks) on that subject. Now, before you proceed, you should also step back and be sure that you have reasonable cause to believe that “multiple threads” will, in fact, “be likely to make this operation run faster.” Previous entries in this thread talk a great deal about this. One “cheap and dirty” way to explore it is to launch multiple copies of this Perl program ... as it is right now (single-threaded) ... in separate terminal-windows. Just how many windows can you have open-and-working at the same time, and still find that each of them finishes at about the same time? (The `time commandname` command in Unix/Linux can give you hard numbers.) If you find that “two windows equals twice-as-long,” then you can immediately conclude that mutlithreading would probably be a waste of time. On the other hand, if you find that multiple windows do let you get the work done (across all windows) in about the same amount of time, then, well-l-l-ll... guess you have two choices! One is to dive into multithreading as described. Another is to ... “just run it as-is in multiple terminal windows!”
Re: Threading and join/termination question by zentara (Archbishop) on Jul 31, 2014 at 18:40 UTC
I'm not sure what you are tring to do, but your second example has my @results :shared = 1; which adds an unneccesary element to the array. It should be: my @results :shared; This seems to work right for me. Remember, it's up to the system as to which thread gets run first, unless you prioritize them. So run the code below multiple times, and see how the system sometimes gets the order right, but will sometimes lets a different thread order. Also you should lock @results `#!/usr/bin/perl -w use strict; use threads; use threads::shared; use Thread::Semaphore; my $sem = Thread::Semaphore->new(15); # max 15 threads my @results:shared; for my $i (0..5) { $sem->down; my $t = threads->create(\&mySubName, $i); } <>; print join(" ",@results); <>; sub mySubName { threads->detach(); my $foo = shift(@_); print "$foo\n"; lock @results; push (@results,$foo); $sem->up; }` [download] I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply] [d/l]
Re^2: Threading and join/termination question by cormanaz (Deacon) on Jul 31, 2014 at 18:57 UTC
Thanks for the reply. Yes I discovered that mistake. Thought the "= 1" was a flag or something. Anyway when I ran this, I still only got 0 1 2 3 in the last line of output, so it was missing 4 and 5. Thinking maybe the script was terminating before the threads finished, I tried adding `while (threads->list()) { sleep 1; }` [download] after the for loop. That took care of it. Interestingly, I first tried .1 for the sleep argument and only got the 4. I don't understand that. Doesn't seem like a push should take that long.	[reply] [d/l]
Re^3: Threading and join/termination question by choroba (Cardinal) on Aug 01, 2014 at 13:11 UTC
sleep doesn't handle real numbers, it takes only the int part of its argument. لսႽ� ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


P is for Practical
	PerlMonks