|Problems? Is your data what you think it is?|
A basic 'worker' threading example.by Preceptor (Deacon)
|on Dec 29, 2013 at 12:15 UTC||Need Help??|
This post is aimed at the people who have heard of Perl threading, and think it intriguing - but haven't really gotten to grips with how it's done. I'm going to put together a ... template, if you like, for a very basic style of threading.
Parallel code is somewhat hazardous for the unwary - because you're 'splitting' your program, and making different parts run at different speeds, you can end up with some incredibly frustrating and hard to track bugs. Every thread is a race condition waiting to happen. So all the bad habits you've picked up when coding in Perl, may well come and bite you if you 'thread it'.
The simplest thing to thread is what's known as an 'Embarassingly Parallel' problem. It's a type of problem where there are multiple tasks, but no dependencies, communication or synchronisation needed.
When 'doing' parallel code, you start to think in terms of scalability and efficiency - every thread start has an overhead. So does every communication between threads. However the most 'expensive' task is synchronising all your threads - they all have to wait until the slowest thread 'catches up'.
Thankfully - an 'embarassingly parallel' problem has none of these things.
An example I might use is pinging 1000 servers. You want to ping each of them, but you don't need to do so in any particular order. However, if a server is offline, then a 'ping' will wait for a timeout, making the process a lot slower.
The only thing you have to worry about is if you ping them 'all at once' you might end up sending a lot of data across the network.
This is a near perfect example of a type of problem I encounter regularly, and so I give it as example code.
Perl actually has quite a good way of 'spotting' embarassingly parallel stuff - the 'foreach' loop is often a good sign.
If you're doing the same thing on every item in a list, then there's a good chance that they might be suitable for parallelisation. You may not gain a large advantage from doing it though - the real advantage of threading is in making use of multiple system resources - processors, network sockets, etc. It's not the only way of achieving that result though, and it will - as a result - 'hog' more of a system's resource when it runs. (but hopefully for less time)
To break down the task:
Which looks a bit like this:
Now, this _is_ a very simple model of a 'threaded' task - and it will only suit situations where there are no dependencies on the results.