http://www.perlmonks.org?node_id=186063

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I apologize for not posting a totally 100% Perl-related question, but I think it could be helpful to many Perl programmers.

I have a Perl script which is always running, forking child processes when needed. The child processes do their work, and exit.

The problem is that sometimes a child process will hang forever. I do not have a clue which part of my code causes this external loop. It looks like the process hangs after a failed database connect, but I cannot find this loop in my code.

I previously used a Unix command which showed all I/O from a certain process by running CMD PID on the command line. Unfortunately, I don't know which command it was. Can anyone help me remember, or are there better ways to determine what this process is doing so I can fix the eternal loop in my code?

FYI, it occurs only once or twice every week, and I cannot reproduce this loop.

Any help is very much appreciated.

Regards, Marcel

Replies are listed 'Best First'.
Re: Hanging process activity
by robobunny (Friar) on Jul 29, 2002 at 20:09 UTC
    you are probably thinking of either truss, trace or strace, depending on what OS you are talking about.
Re: Hanging process activity
by sauoq (Abbot) on Jul 29, 2002 at 20:16 UTC
    I'm guessing it was lsof that you used.

    You can find more info on its freshmeat page.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Hanging process activity
by traveler (Parson) on Jul 29, 2002 at 21:49 UTC
    Here is pseudo-code of what I do in a similar situation:
    my_kill(){ $pid = shift; kill $pid; print "I killed $pid"; } ... if (($pid = fork()) == 0){ do_child_stuff; } else { // set a timeout. if pid does not die in 60 // seconds call my_kill with $pid as the arg wait_for_pid($pid, 60, &my_kill); }
    wait_for_pid stores the pid, the function to call, and a time to call it in a hash. Each time a child dies it sends a signal (SIGCHLD) to me. I look through the list of children and remove those from the hash. I have a timer function running each second (I use gtk's scheduler for this as it is a gtk app) that, among other tasks, looks for processes that have not died in time and kills them. This has worked reliably for over a year. There are several variations you could use that might be more effecient for your environment.

    If you are using threads, you could also use that mechanism to implement the timeout.

    HTH, --traveler