http://www.perlmonks.org?node_id=802767

r1n0 has asked for the wisdom of the Perl Monks concerning the following question:

Hello again monks,
I have been toying with threads lately and wanted to know a good method for monitoring threads. If they die, I want to restart them. I ideally want to figure out why a thread dies.

I have done some research on perlmonks and just haven't found the information I am seeking for this topic.

Basically, the code below (StartJobRequest subroutine) is the thread being launched within my code that dies. I don't have all the code here because there are other threads, but none of them seem to be having a problem. I have all kinds of logging happening in my real code to find any problems, but nothing is being identified. So, I am putting the basic thread routine here and asking for help. I am pretty new to threads, so any pointers would be most appreciated. I think it would be great to start a thread that monitors the other threads, and when one dies, start it back up, and hopefully logging will help me find what the issue is.
========================================================
UPDATE: Since I have posted this message, I have successfully been able to relaunch a thread after it completes. I don't know if this works the same as a thread that dies. I was able to create a monitoring thread, but I wonder what prevents the monitor thread from dieing. And I have been able to start a thread back up from the main program, but in order to do that, I had to have all threads complete. Is this the way that is suppose to work? I have read the cpan page for threads, and there are good examples there, as before I posted this message, but there are some unclear items (as listed with this message) that just aren't too clear to this thread novice. thanks.
========================================================
#!/usr/bin/perl -w use strict; use warnings; use threads; use threads::shared; use Thread::Queue; use IO::Socket; our $alock:shared; our @uri_list:shared; my $Server_IP = "172.168.0.1"; my $Server_IP_tcp_request_port = 40000; #Set up the queue for playing my $Q = Thread::Queue->new(); #Start the thread that keeps the queue full of jobs my $thr1 = threads->create(\&StartJobRequest,1); #Start a thread that checks the queue for jobs #This is only a filler for this code on perlmonks (no provided code fo +r StartPicker is provided, in order to reduce code space) #This code takes queued jobs and performs them, as it completes, it pu +lls from queue for next job. #The actual code has many different Job pickers my $thr2 = threads->create(\&StartPicker, 2); #Join the threads $thr1->join(); $thr2->join(); sub StartPicker{ #Cool stuff goes on here. ;-) } sub StartJobRequest{ my $desired_q_level = 25; my $thid = threads->tid(); #let's start a never ending process for(;;){ my $pending = $Q->pending(); #Check to see if we need to get a task from server. #Only to be done if queue has less than 25 jobs if ( $pending <= $desired_q_level ){ my $collect = $desired_q_level - $pending; for(my $a=0; $a<$pending; $a++){ my $socket = IO::Socket::INET->new( PeerAddr => $Server_IP, PeerPort => $Server_IP_tcp_request_port, Proto => 'tcp', Reuse => 1, ) or warn "Can not open socket for client to $Serv +er_IP on port $Server_IP_tcp_request_port\nReason: $!\n\n"; if ($@ ){ #Set $a to exit loop after this $a=$collect; my $warn_sleep = int rand 10; print "Can not connect to server $Server_IP on por +t $Server_IP_tcp_request_port\nSleeping for $warn_sleep seconds\n"; } else{ my $payload = "Need_Job"; $socket->autoflush(1); print $socket $payload; my $line = <$socket>; if ($line =~ /^JOB:/){ $line=~ s/\n//; write_log_entry("Job received: $line\n"); lock($alock); $Q->enqueue($line); } elsif( $line=~ /=EMPTY=/){ #set $a to exit loop after this $a=$collect; sleep(3); } else{ print "Unknown returned info from Server: $lin +e\n"; #set $a to end loop $a=$collect; } } close($socket); } } } }


Thank you in advance for your help.

Replies are listed 'Best First'.
Re: Monitoring Threads and keeping them alive/reviving them
by bot403 (Beadle) on Oct 22, 2009 at 19:50 UTC

    If no other threads are dying then its certainly something code specific in the subroutine for that thread. Have you tried running your "picker" code without threads? i.e. single threaded mode? Instead of pushing something onto the Queue after accepting a socket connection just call the picker subroutine. If it dies in the main thread you should have an easier time debugging it.

    Also, some code to catch __DIE__ could be really useful. Try this:

    $SIG{__DIE__} = \&sig_die; sub sig_die{ my ($msg) = @_; # Do nothing if called from inside an eval block. die @_ if $^S; stack_trace(); } sub stack_trace(){ use File::Basename; print STDERR "\n-------Begin Stack Trace-----------\n"; my $i = 0; while(my ($package, $filename, $line,$sub ) = caller($i++)){ my $file = basename($filename); print STDERR "($i) $sub ${file}::${package} line $line\n"; } print STDERR "\n-------End Stack Trace-------------\n"; }