perlmeditation
Preceptor
<P>This post is aimed at the people who have heard of Perl threading, and think it intriguing - but haven't really gotten to grips with how it's done. I'm going to put together a ... template, if you like, for a very basic style of threading. </P>
<P>Parallel code is somewhat hazardous for the unwary - because you're 'splitting' your program, and making different parts run at different speeds, you can end up with some incredibly frustrating and hard to track bugs. Every thread is a race condition waiting to happen.
So all the bad habits you've picked up when coding in Perl, may well come and bite you if you 'thread it'. </P>
<P>The simplest thing to thread is what's known as an '<A HREF="http://en.wikipedia.org/wiki/Embarrassingly_parallel">Embarassingly Parallel</A>' problem.
It's a type of problem where there are multiple tasks, but no dependencies, communication or synchronisation needed. </P>
<P>When 'doing' parallel code, you start to think in terms of scalability and efficiency - every thread start has an overhead. So does every communication between threads.
However the most 'expensive' task is synchronising all your threads - they all have to wait until the slowest thread 'catches up'.
</P>
<P>
Thankfully - an 'embarassingly parallel' problem has none of these things. </P>
<P>An example I might use is pinging 1000 servers. You want to ping each of them, but you don't need to do so in any particular order. However, if a server is offline, then a 'ping' will wait for a timeout, making the process a lot slower. </P>
<P>The only thing you have to worry about is if you ping them 'all at once' you might end up sending a lot of data across the network. </P>
<P>This is a near perfect example of a type of problem I encounter regularly, and so I give it as example code. </P>
<P>Perl actually has quite a good way of 'spotting' embarassingly parallel stuff - the 'foreach' loop is often a good sign.</P>
<P>If you're doing the same thing on every item in a list, then there's a good chance that they might be suitable for parallelisation. You may not gain a large advantage from doing it though - the real advantage of threading is in making use of multiple system resources - processors, network sockets, etc. It's not the only way of achieving that result though, and it will - as a result - 'hog' more of a system's resource when it runs. (but hopefully for less time) </P>
<P>
To break down the task:
<UL>
<LI>We create some 'worker' threads, that do a very simple 'run a command' operation (in this case, ping).</LI>
<LI>We define a list of servers (read from a file), and use the Thread::Queue module to handle queuing.</LI>
<LI>We wait for thread completion, and collate errors. </LI>
</UL>
</P>
<P>Which looks a bit like this:</P>
<CODE>
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue -> new();
my $failed_q = Thread::Queue -> new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker
{
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q -> dequeue() )
{
chomp ( $server );
print threads -> self() -> tid(). ": pinging $server\n";
my $result = `/bin/ping -c 1 $server`;
if ( $? ) { $failed_q -> enqueue ( $server ) }
print $result;
}
}
#insert tasks into thread queue.
open ( my $input_fh, "<", "server_list" ) or die $!;
$process_q -> enqueue ( <$input_fh> );
close ( $input_fh );
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q -> end();
#start some threads
for ( 1..$nthreads )
{
threads -> create ( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads -> list() )
{
$thr -> join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q -> dequeue_nb() )
{
print "$server failed to ping\n";
}
</CODE>
<P>Now, this _is_ a very simple model of a 'threaded' task - and it will only suit situations where there are no dependencies on the results. </P>