Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Could not catch all children after fork, some of them never end

by krabbl (Initiate)
on Mar 13, 2013 at 11:17 UTC ( #1023172=perlquestion: print w/ replies, xml ) Need Help??
krabbl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a problem with catching child-processes after a SIG INT was caught by my sig-handler which only sets a global var (don't blame me for using global vars). My script is forking a certain amount of childs and the parent is supposed to wait until all children have finished gracefully. Usually this works fine, but if I press CTRL-C the script behaves strangely.

As mentioned before my sig-handler only sets a global var to 1. This var is checked by each child at certain points and if set to 1 the child is supposed to only write some data to a BerkeleyDB and afterwards untie and undef some hashes/vars (used with BerkeleyDB). After this is done the child should exit and the parent could catch it. For some children this works very fast but for others not. It seems that these children are frozen. After waiting some time, some children are finishing yet again, but mostly I have to kill all processes manually and so I'm loosing the data which hasn't yet been written to the BerkeleyDB until now.

I tried different approches like Parallel::ForkManager and the same behaviour occurs. If I'm using waitpid with WNOHANG my CPU is used to 100% and nothing seems to work anymore (regarding to my script). Sometimes the parent goes ahead and does not wait until the childs are finished (should be normal behaviour ?!?). If this is normal behaviour then how can it be that sometimes my CPU is up to 100% with nothing going on and sometimes my parent goes ahead without waiting? If I'm using wait() oder waitpid($_,0) then everything works fine except that not all children could be catched (obviously cause the are not completed), but it seems that they are not finishing at all. Some of them do after some time, but others do not.

This shit really drives me crazy, because I have to implement a method to end my programm properly with the current "status" saved (thats what the children are doing when the write data to BerkeleyDB and some files), so that it can be resumed later when starting the whole script with the related argument.

Because of legal reasons I'm not allowed to post the whole script, I'm sorry for that but I hope you could still help me

Three dots are standing for removed unimportant code like vars, checks and sys-call cmds (bash commands only).

$SIG{INT} = \&ctrlc; #$SIG{CHLD} = \&ripZombie; foreach(@ARGV){ my $arg=$_; unless(defined($arg)){ die("$!\n") if(help()); exit 0; } if($arg eq '-e' || $arg eq '--erase'){ die("$!\n") if(erase()); } elsif($arg eq '-i' || $arg eq '--info'){ die("$!\n") if(versionInfos()); } elsif($arg eq '-c' || $arg eq '--create'){ die("$!\n") if(createData()); } elsif($arg eq '-p' || $arg eq '--proceed'){ die("$!\n") if(continueDataCreation()); } elsif($arg eq '-v' || $arg eq '--verify'){ die("$!\n") if(verifyData()); } elsif($arg eq '-s' || $arg eq '--search'){ die("$!\n") if(searchExtInfos()); } elsif($arg eq '-h' || $arg eq '--help'){ die("$!\n") if(help()); } elsif($arg eq '-x' || $arg eq '--extract'){ die("$!\n") if(fallbackExtractResults()); } else{ die("$!\n") if(help()); } } sub createData{ ### Set default sig-handler for ctrl+c ### $SIG{INT} = "DEFAULT"; ... my @a_childs=(); ... ### Set custom sig handler for ctrl+c ### $SIG{INT} = \&ctrlc; ### Begin forking regarding amount of instances ### for(my $i=0; $i<$AMOUNT_INSTANCES; $i++){ my $child=fork(); if($child){ ### parent ### push(@a_childs, $child); } elsif($child == 0){ ### child ### ... ### Build BerkeleyDB Environment ### my $envChild = new BerkeleyDB::Env( -Home => "$WORKDIR" , -Flags => DB_CREATE| DB_INIT_CDB | DB_INIT_MPOOL) or die "FAILURE: Cannot open environment: $BerkeleyDB:: +Error\n"; ### Tie Hash ### our $DB_RESULT = tie %H_RESULT, 'BerkeleyDB::Hash', -Filename => "$FILE_RESULT_DB", -Flags => DB_CREATE, -Env => $envChild or die "FAILURE: Cannot open database: $BerkeleyDB::Error\ +n"; ### Tie Hash for IP-pref-pairs ### my $db_prefChild = tie %H_PREF, 'BerkeleyDB::Hash', -Filename => "$FILE_PREF_DB", -Flags => DB_CREATE, -Env => $envChild or die "FAILURE: Cannot open database: $BerkeleyDB::Error\ +n"; ### only copies some files to certain positions ### system("..."); ### Query responsible RIR and extract information ### while(keys(%H_PREF) > 0){ ### Lock hash ### my $lock=$db_prefChild->cds_lock(); ### extract random key and mark it ### my $key=""; my $value=""; do{ $key=(keys %H_PREF)[rand keys %H_PREF]; $value=$H_PREF{$key}; }while($value =~ m/^inuse\_/); $H_PREF{$key}="inuse\_$value"; ### Unlock hash ### $lock->cds_unlock(); ### this funktion does only some regex work and system-calls, at begin + and end the var set by sig handler is checked ### handlePref($key,$value); ### Check if SIG{INT} (CTRL+C) was pushed ### if($CHECK_CTRL == 1 ){ $lock=$db_prefChild->cds_lock(); $H_PREF{$key}=$value; $lock->cds_unlock(); last; } delete($H_PREF{$key}); } warn("$!\n") if(cleanUpChild($childPid)); ### Clean Up BerkeleyDB Env and untie ## undef $DB_RESULT; untie %H_RESULT; undef $db_prefChild; untie %H_PREF; exit 0; } else{ return 1, warn("FAILURE: Could not fork! $!\n"); } } ### Wait/catch all childs ### foreach(@a_childs){ while(){ my $pid=waitpid(-1, WNOHANG); last if($pid <= 0); } print "Child ended successfully (PID: $_)!\n"; } #foreach(@a_childs){ # wait(); # waitpid($_,0); #} ... return 0; } ### sig handler for ctrl-c ### sub ctrlc{ $SIG{INT} = \&ctrlc; ### Set global check variable to 1 ### $CHECK_CTRL=1; print("! Please be patient, programm is existing but will end all +current queries and this could take some minutes !\n"); return 0; } ### reaper but not in use ### sub ripZombie{ my $pid; while ((my $pid = waitpid(-1, WNOHANG)) > 0) { ### If you want to do sth with your reaped child pids ### ; } $SIG{CHLD} = \&ripZombie; }

Comment on Could not catch all children after fork, some of them never end
Download Code
Re: Could not catch all children after fork, some of them never end
by educated_foo (Vicar) on Mar 13, 2013 at 11:51 UTC
    This var is checked by each child
    No it's not. Memory is not shared between parent and child processes. You want to have the parent notice the signal, signal the children, and wait for them.

      I thought its a fork so everything should be cloned until this position and every child should get the SIG INT (the sig handler is defined before the fork) or am I wrong? Because the text msg (print cmd in sig handler) shows up for every single child and even some childs are finishing as expected after pressing ctrl-c.

        I think I may have oversimplified in my answer. I know that when you press control-c, the signal is delivered to the process currently attached to the terminal. Maybe the children take over from the parent when it exits if you didn't close STDIN and STDOUT before forking, but I would have to look it up and play around a bit to figure out exactly what happens. Sorry.
Re: Could not catch all children after fork, some of them never end
by sundialsvc4 (Abbot) on Mar 13, 2013 at 14:37 UTC

    No, the implementation of fork in Perl is somewhat of a pseudo-implementation with quite a few caveats.   Do a super-search on signal to read some threads which wound-down just a few days ago.

      No, the implementation of fork in Perl is somewhat of a pseudo-implementation with quite a few caveats. Do a super-search on signal to read some threads which wound-down just a few days ago.

      That answer is too lazy to be useful, and its wrong to boot -- typical

Re: Could not catch all children after fork, some of them never end
by soonix (Curate) on Mar 13, 2013 at 23:10 UTC
    Wild guess: Are the children writing to the same db? (B)locking issues?

      All children are writing to the same BerkeleyDB, but they are using cds_lock/unlock.

      I figured out that some loops became endless after pressing CTRL+C, so I added some additional checks. But at one point there could be a problem, so I would try to check manually if a current lock exists, but I could not figure out how this works in BerkeleyDB...does anybody know?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1023172]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (16)
As of 2014-12-22 17:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (121 votes), past polls