Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Perl Threads Boss/Worker Example

by joemaniaci (Sexton)
on Apr 24, 2012 at 20:50 UTC ( #966939=perlmeditation: print w/ replies, xml ) Need Help??

So I was working on multi-threading a little tool the last couple of days, but I didn't need any sharing or anything of the sort. I just needed to assign one thread per one file that needed to be parsed. The reason I am writing this is because the documentation can be a little misleading. The one consistent error I saw people make while googling is because of detach(). From the documentation, "Once a thread is detached, it'll run until it's finished; then Perl will clean up after it automatically." When I first started this little tool I thought this was just what I needed. However, what the documentation doesn't say is that if the Main Perl Process closes before a detached thread does, all detached threads are closed as well, regardless of completion or not. Looking back now, it makes sense, but the documentation at the time gave me the impression that Perl created threads not reaped by the Main Thread(when detached). Not only that, but I read somewhere that the potential exists that the resources of the detached threads might also not be freed. So my issue was that all my threads were being created properly but the Main Perl Process was closing everything before my Worker threads could do their job. So here is my solution.

use threads; $num_threads; $thread_limit = xxx; foreach $file_to_execute (@array_of_files) { $thread = new Thread \&do_stuff, param1, $param2; @threadlist = threads->list(threads::running); $num_threads = $#threadlist; while($num_threads >= $thread_limit) { sleep(30); @threadlist = threads->list(threads::running); $num_threads = $#list; } } while($num_threads != -1) { sleep(1); @threadslist = threads->list(threads::running); $num_threads = $#list; } sub do_stuff{ ... }

So as you can see, I create a thread in a foreach loop for each item to be executed so to speak. The first while loop also limits how many threads are running at a time, useful for someone using a weaker computer. I just ran it on my work computer that has two quad cores and 16 gigs of ram without the while loop and it didn't break a sweat, but it might be useful for some people. The second while loop essentially prevents the Main Perl Process from closing while the array of running threads is not empty. Once it is emptied, the program quits...and yes I should have used strict. I was being lazy. I just remembered another reason I wanted to write this up. I was having an issue where

 threads->list(threads::running)

would increment to the max number of threads, but never decremented as threads finished. But now I can't think of how I solved it, or if there was ever truly something wrong in the first place besides my own fingers.

Either way this was a fun little exercise. I went from processing 200+ files(22.5 GB total, with 200+ threads!) in 1 1/2 hours to 23 minutes. The only reason it is as high as 23 minutes is because I have three 4 GB files that I have to read from top to bottom.So near the end I have four threads, the Main process and the three files being read and split apart. Don't think I can do anything about that in terms of threading. I am going to try 80+ gigs and see if I can break my computer.

Comment on Perl Threads Boss/Worker Example
Select or Download Code
Re: Perl Threads Boss/Worker Example
by arpad.szasz (Monk) on Apr 24, 2012 at 21:56 UTC

    It seems You are mixing old-style and deprecated threads model(Thread module) with the new ithreads thread model (threads module). Creating the threads should be done like this:

    $thread = threads->create( \&do_stuff, param1, $param2 );

    Please also note that indirect object notation like new Class is discouraged in Modern Perl in favor of Class->new.

    If You do care about worker threads finishing their jobs You should use join() instead of detach() so You don't need the second while loop. Please see the perlthrtut for a more thorough explanation.

    Please give the Perl version and OS You are running this, so the monks here can suggest better alternatives to Your current code.

    You can also find valuable information regarding threads programming in Perl from fellow monks BrowserUk, ikegami, zentara.

    Looking forward to Your reply!

      Windows 7, whatever the latest Strawberry perl is.

      The reason I didn't do join was because I wasn't returning anything and it behaved in a non-parallel manner since I had to wait for the joined thread to return before I could start another thread. Unless I of course did something very wrong. My goal was parallelization and so far so good. I just attempted 450 files with 450 threads trying to tax my work computer. Maybe I can request an SSD at work since I think syscalls for read/write is what is the slowest part of my program now.

      I will however look into the more modern code, see how it behaves

Re: Perl Threads Boss/Worker Example
by Tanktalus (Canon) on Apr 25, 2012 at 19:16 UTC

    Rather than creating/destroying worker threads for each file, maybe try using a queue? Each worker thread looks for items in the queue, and work on them, avoiding your sleep()s (I hate sleep in my code - it's usually a sign that I did something wrong). You can then join on all the threads at the end, which is simpler.

    See the docs for Thread::Queue. You'll end up with something like this:

    my $q = Thread::Queue->new(); # A new empty queue # Worker threads my @thr = map { threads->create(sub { while (my $item = $q->dequeue()) { # assuming undef isn't a valid value and so can be a marker. return unless defined $item; do_stuff($item, @_); }, $param1, $param2 )->detach(); } 1..$thread_limit; # Send work to the threads $q->enqueue($_) for @array_of_files; # send markers. $q->enqueue(undef) for 1..$thread_limit; # terminate. $_->join() for @thr;
    In this case, you can pretty much not count your main thread: after it finishes enqueueing all the files, it will only wake up once per worker thread. So, if you have a quad-core, you really can have a thread limit of 4 instead of 3. That can speed things up a bit as well. Also, you may be able to tweak it a bit - depending on how much each thread spends in I/O vs CPU, you may be able to use 5 or even more threads. Of course, you may throttle your disk at this point, so you may find your disk spinning at full tilt while your CPU usage still doesn't peg at full. At that point, yes, SSD is probably your next best bet for improving the speed.

      Although I’m very glad for your Meditation and for your thought in posting it, I do think that Tanktalus does have a point worthy of its own meditation.   The number of work units to be done usually ought not to be equivalent to the number of threads to launch to do that work.   In the general case, that strategy does not scale well, and, when it “doesn’t scale,” it tends to “not scale well” rather badly.   A to-do-list based approach, on the other hand, much more closely matches what we tend to do in real life (except when senior management is in a panic...).   The number of workers is established according to the system’s ability to do work efficiently in parallel without interfering with one another by their mere presence.   The size of the to-do-list queue is both unpredictable and unrelated.   When 100 new customers pile into a fast-food restaurant, the establishment does not clone 100 new workers.

      Thanks for posting.

      I realize that this is an old thread but I ran across this trying to solve a similar problem: run a specific number of threads to work on a set of issues. I thought it might be useful for others to know the correct implementation of Tanktalus' suggestion.

      Basically, there were two issues that took me a while to sort out: firstly, the newly created threads are detached, which makes them impossible to join later; secondly (much less important), adding undef as a queue item isn't necessary when using dequeue_nb instead of dequeue.

      The code that actually works for me looks like this:

      #! /usr/bin/perl use threads; use Thread::Queue; my $q = Thread::Queue->new(); # A new empty queue # Send work to the threads $q->enqueue($_) for @ARGV; # Worker threads my $thread_limit = 8; my @thr = map { threads->create(sub { while (defined (my $item = $q->dequeue_nb())) { doStuff($item); } }); } 1..$thread_limit; # terminate. $_->join() for @thr;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://966939]
Approved by sundialsvc4
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2014-07-29 23:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls