Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Threading and join/termination question

by cormanaz (Deacon)
on Jul 31, 2014 at 17:22 UTC ( [id://1095780]=perlquestion: print w/replies, xml ) Need Help??

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good day bros. I am trying to learn how to use threads in Perl to apply it in a script that is doing calculations on a large graph. I know I need to limit the number of threads I spawn, so I've been reading up and have demo code working that limits the number of threads with semaphore and works as expected.
#!/usr/bin/perl -w use strict; use threads; use Thread::Semaphore; my $sem = Thread::Semaphore->new(15); # max 15 threads my @threads; for my $i (0..5) { $sem->down; my $t = threads->create(\&mySubName, $i); push(@threads,$t); } foreach my $t (@threads) { print $t->join(),"\n"; } sub mySubName { my $foo = shift(@_); return $foo * 10; # release slot: $sem->up; }
However I have a question about the @threads array. Since that exists till then end of the loop, do the thread objects it contains only hold the values returned by the sub, or do they have other data that would eat a lot of memory? If so then the approach I have here won't work for processing all pairs of nodes in a 7000 node graph.

Assuming that's the case I tried this

##!/usr/bin/perl -w use strict; use threads; use threads::shared; use Thread::Semaphore; my $sem = Thread::Semaphore->new(15); # max 15 threads my @results :shared = 1; for my $i (0..5) { $sem->down; my $t = threads->create(\&mySubName, $i); } print join(" ",@results); sub mySubName { threads->detach(); my $foo = shift(@_); push (@results,$foo); $sem->up; }
but the output is 1 0 1 2 3 which is not right.

Guidance appreciated.

Replies are listed 'Best First'.
Re: Threading and join/termination question
by oiskuu (Hermit) on Jul 31, 2014 at 21:16 UTC

    Technically, you cannot set a thread limit with semaphores this way. It does work in practice, but there's no real guarantee that a thread, or all threads, aren't preempted right after the $sem->up().

    Second point, do not detach a thread if you need to wait upon it. Grab its end result (return value) with a join(), instead.

    You may not want to create too many threads (especially with fine-grained jobs). An alternative that might be suitable is to arrange for a conveyor belt with N workers. Use Thread::Queue for that. Here's a simple, if overly recursive, example:

    use threads; use Thread::Queue; my $WQ = Thread::Queue->new('unit001' .. 'unit123'); sub qrun { map { $_ ? (worker($_), qrun()) : () } $WQ->dequeue_nb; } sub worker { warn "Work $_"; select(undef, undef, undef, rand 3); # sleep some return "$_!"; } print for map { $_->join } map { threads->new(\&qrun) } 1 .. 15;

      That’s a very-terse example of a million-dollar idea.   (Great example if you’re a serious Perl-head ... not-so-easy if you’re not.)

      The idea is this:   instead of defining “a thread” as corresponding to “a unit of work to be completed,” and then trying to “limit the number of” those threads, create a pool of threads, calling each one of them “workers.”   Then, give those workers a production-line queue of things-to-do.   Each worker grabs a request off of the production-line, does it, shoves the results downstream somewhere, and then grabs another one ad infinitum, surviving until the production-line finally runs dry.   The number-of-workers (thread-pool size) should correspond to the maximum number of such units-of-work that you know this piece of hardware can predictably accomplish.   So, whether the production-line queue might be short or long, you remain confident that the work is being carried-out as fast as possible ... no matter how short or long the queue might be, the “completed units-of-work per second” will remain steady.   If you are gifted with faster hardware, you simply turn-up the number-of-workers knob a little bit more.   (And if you are gifted with a cluster of units, spread the work out among all the CPUs you have, so-many workers apiece.)

        I really have to agree with sundialsvc4. Threads are so expensive to create that you are almost aways much better off to create a set number of threads (usually no more than the physical number of cores you have, for optimal results), then using a queuing system, such as Thread::Queue, to distribute work until all the work is complete.

        Also, unless you are creating a thread that is performing a never ending task (such as listening to events from an external source), its generally not a great idea to detach them, especially if you are interested in when they complete or what they return.

        Actually I was wondering about that...are there rules of thumb for how many threads a given platform can deal with? Equal to the number of cores? Limited by memory?
      I believe you are right because I ran my code and it seemed to run no faster than the unthreaded code. I will try to work out the queued version. Though subdialsvc4 is correct that it's a bit intimidating.

      As for the original question, what is in the thread object? Just the result of the sub operation, or a bunch of other data that consumes memory? Docs say a thread is a separate instance of Perl, so I'm imagining the object contains all kinds of state information.

        I suggest that “the queued version,” as presented, is “intimidating” merely because it is very terse.   (It is, so to speak, “Perl golf.”)   The essential idea is actually simple:   a worker-thread survives for a long time, retrieving units-of-work from some thread-safe queue and executing them ... of course within an eval {..} block so that the thread will survive even if the unit-of-work does not.   The number of workers, of course, determines the number of units-of-work that this system will attempt to carry out at any one time.   The number of workers (just like the number of people who are scheduled for a shift at McDonald’s ...) should be “a knob” that is easy to set.

        Perl has an interesting implementation of threading ... you need to look at the various docs (and here in PerlMonks) on that subject.

        Now, before you proceed, you should also step back and be sure that you have reasonable cause to believe that “multiple threads” will, in fact, “be likely to make this operation run faster.”   Previous entries in this thread talk a great deal about this.   One “cheap and dirty” way to explore it is to launch multiple copies of this Perl program ... as it is right now (single-threaded) ... in separate terminal-windows.   Just how many windows can you have open-and-working at the same time, and still find that each of them finishes at about the same time?   (The time commandname command in Unix/Linux can give you hard numbers.)   If you find that “two windows equals twice-as-long,” then you can immediately conclude that mutlithreading would probably be a waste of time.   On the other hand, if you find that multiple windows do let you get the work done (across all windows) in about the same amount of time, then, well-l-l-ll... guess you have two choices!   One is to dive into multithreading as described.   Another is to ... “just run it as-is in multiple terminal windows!”

Re: Threading and join/termination question
by zentara (Archbishop) on Jul 31, 2014 at 18:40 UTC
    I'm not sure what you are tring to do, but your second example has
    my @results :shared = 1;
    which adds an unneccesary element to the array. It should be:
    my @results :shared;

    This seems to work right for me. Remember, it's up to the system as to which thread gets run first, unless you prioritize them. So run the code below multiple times, and see how the system sometimes gets the order right, but will sometimes lets a different thread order. Also you should lock @results

    #!/usr/bin/perl -w use strict; use threads; use threads::shared; use Thread::Semaphore; my $sem = Thread::Semaphore->new(15); # max 15 threads my @results:shared; for my $i (0..5) { $sem->down; my $t = threads->create(\&mySubName, $i); } <>; print join(" ",@results); <>; sub mySubName { threads->detach(); my $foo = shift(@_); print "$foo\n"; lock @results; push (@results,$foo); $sem->up; }

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      Thanks for the reply. Yes I discovered that mistake. Thought the "= 1" was a flag or something.

      Anyway when I ran this, I still only got 0 1 2 3 in the last line of output, so it was missing 4 and 5. Thinking maybe the script was terminating before the threads finished, I tried adding

      while (threads->list()) { sleep 1; }
      after the for loop. That took care of it. Interestingly, I first tried .1 for the sleep argument and only got the 4. I don't understand that. Doesn't seem like a push should take that long.
        sleep doesn't handle real numbers, it takes only the int part of its argument.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1095780]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-04-23 09:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found