http://www.perlmonks.org?node_id=1081689


in reply to [Thread::Queue] How to handle unexpected termination and time out in threads

First off - I'm going to offer the suggestion that an anonymous sub is a bad plan if it's more than a couple of lines long.

There's one simple reason - because if you write a 'standard' sub, you can test your program non-threaded, and make all the 'concurrency issues' go away entirely.

Thread::Queue supports 'end()'. This is a better way of handling the problem then enqueing a load of 'undefs'.

When a thread is 'ended' then any (blocking) dequeue call returns 'undef' immediately. And a while loop construct like this:

while ( my $item = $my_q -> dequeue() )

will be 'undef' so your while loop will exit.

Handling a loop that's prone to crashing out is typically accomplished by 'eval'. Use 'eval' to run the code, and then trap if it errors.

Handling 'hanging' conditions is a bit harder - especially within a thread. I would normally suggest using an 'alarm' call and using $SIG{'ALRM'} to trap it. However, you need to be slightly careful in doing so, simply because thread -> join() will block this signal.

A better approach would be to correct the error whereby your code hangs. It may be that you're just waiting forever on a dequeue() call or something similar. These go away if you close a thread to terminate it.

I would also note - you 'end' your workers via queuing undef, and then go single threaded to parse your results. Why not try and sync your threads _before_ processing them, because then you can just 'end()' your results queue. Alternatively - have a separate 'result collator' thread (one only) and stuff each of your 'worker' threads into an array of thread references, which you can then 'join()' - meaning you always know when your workers have terminated (one way or another) before you close off your results processing.

for ( 1..$num_workers ) { my $thr = threads -> create ( \&worker_thread ); push ( @worker_threads, $thr ); } foreach my $thr ( @worker_threads ) { $thr -> join(); }

However as already observed - your process looks to be IO oriented, in that the most expensive thing you're doing is _probably_ reading/writing the files. More threads will not help particularly, and may be actively counterproductive - the more random IO pattern you use, the less efficient your storage subsystem will be able to cache, prefect and avoid disk seek contention.