Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: fork()ing a large process

by traveler (Parson)
on Nov 29, 2001 at 22:39 UTC ( [id://128435]=note: print w/replies, xml ) Need Help??


in reply to fork()ing a large process

I'm not sure what reap_children is supposed to do. What it does is reap any dead children. Your args to waitpid say "Check to see if any processes are dead, but don't hang this process waiting." You also don't seem to be doing anything with $kid since if more than one process dies, you ignore the pid of the first one. Follow maverick's advice if you care when they are all dead. If you are just trying to avoid zombies, do  $SIG{CHLD} = sub {wait}; It really depends on what you want to do.

Now, why does your process appear to wait for some children to die? Well, one possibility is that you have exceeded the maximum number of children your OS allows. That is, if you have a lot of files to process, you may have forked as many children as you are allowed to fork. To figure that out you can keep track of the number of live children and see if that number gets big. The limit on my system is 511.

BTW you can avoid the magic numbers in waitpid if you use POSIX "sys_wait_h";.

HTH, --traveler

Update: I'd never heard about the 2% chance of corruption. I've always followed the Camel's advice about using the signal handler, I guess I've been lucky. Maybe one choice is to do a periodic wait for children using code similar to reap_children. You might also try setting $SIG{CHLD} to SIG_IGN.

Replies are listed 'Best First'.
(tye)Re2: fork()ing a large process
by tye (Sage) on Nov 29, 2001 at 22:54 UTC

    Note that $SIG{CHLD}= sub {wait}; has another problem. If two children exit at nearly the same time, the two SIGCHLD signals can arrive close enough together that the signal handler only gets called once. Each time this happens, you'd get one more zombie hanging around.

    Back to the original problem, I'd add some debug print statements so that you can figure out exactly where the process is hanging.

    Also, $SIG{CHLD}= 'IGNORE'; only works on some operating systems (SysV-based ones, as I recall).

            - tye (but my friends call me "Tye")
      You are correct that the wait should be in a loop such as the reap_children function has. I was clearly not thinking clearly... Also, on SYS V systems IIRC CHLD or CLD signals are regenerated if you do not do wait on the clild so the race condidion you mention may not hold true there and wait by itself should work. I suppose I've been using SYS V-based systems too much to think of some of these issues. My bad.

      tye is correct. To be portable you'll need to find another solution. I have used non-blocking waits in a timer loop to clean up children and that may work here. Of course, if you don't care about portability, use what works on your system and document that it isn't portable.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://128435]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-25 19:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found