http://www.perlmonks.org?node_id=874580

donfox1 has asked for the wisdom of the Perl Monks concerning the following question:

I'm attemptintg to run several separate forked SQL queries from a function in a program. I'd like to avoid zombies, run each query in the background and with nohup. The final result will be a report on data counts.

do_queries() will do the work by calling do_query() many times passing a new argument each time. I'll eventually pass in an array of all the arguments for each query.

The question is: How to background each query? Running just one query the program (parent) waits for the child to return, i.e. freezes until done. It should not freeze until all the queries are forked and done.

Also, it this is not the right approach what should I do?

sub do_queries { $rc = get_query('t_ub92_hdr_ext_key', 'D', '20101001', '20101031') +; print "-------Query Returned is $rc\n\n\n"; defined(my $pid = fork) or die "Cannot fork: $!"; unless($pid) { print "Child process running\n\n\n"; $rc = do_query($rc); if ($rc == 0) { print "Query Done!\n"; print $time = localtime(); } exit 0; print "This is the parent process(waiting!) and child ID is $p +id\n\n\n"; while (wait() != -1) {} do { my $kid = waitpid(-1, WNOHANG); } while $kid > 0; print "Parent Process is Exiting\n"; } }

Replies are listed 'Best First'.
Re: Background forked processes from a function.
by aquarium (Curate) on Dec 01, 2010 at 01:41 UTC
    hi, hope you don't have to work with a database that contains tables for holding timestamped transactions, especially if there's any indexed columns...as anything beyond several thousand such records bogs down both indexing during insert/update and a view/query takes forever. i've seen several systems/businesses that end up struggling with this at some stage, as it doesn't scale well. on the other side of the spectrum, text based transaction logging scales linearly, and runtimes for batch processing of such transaction logs are predictable. In any case, would it work better for you if instead of trying to control several long running background queries, if the queries became suitably controlled batch processes with less direct control by a foreground controller? in other words either just about entirely decoupled OR at least using a queue/execute/review mechanism that doesn't need so much direct control of the queries.
    the reason for this sort of advice instead of just code to answer exactly what you asked for is, there are possible consequences to having the query processes go zombie, if the controller dies. that's my take on it anyway, and you may not be in a position to restructure bits at will.
    the hardest line to type correctly is: stty erase ^H
Re: Background forked processes from a function.
by chrestomanci (Priest) on Dec 01, 2010 at 11:09 UTC

    It sounds to me like you need something like Parallel::ForkManager to handle your many sub processes, each doing a query. That module will keep track of all the children for you, and make it easy to write the code to merge the query results from all the queries in those children.

    Something like this perhaps:

    #!/usr/bin/perl use Parallel::ForkManager; my $max_threads = 5; # Chose a number for best performace. my $fork_mgr = new Parallel::ForkManager($max_threads); my @big_list = ( { query => 'SELECT foo FROM tblBar WHERE baz LIKE `%frob%`', started => 0, complete=> 0, exit_code=> undef, results => [], }, # etc ); $fork_mgr->run_on_finish ( sub { my($child_pid, $exit_code, $child_id, $exit_signal, $core_dump +, $ret_data) = @_; # Code to store the results from each thread go here. # eg: $big_list[$child_id]{exit_code} = $exit_signal; $big_list[$child_id]{complete} = 1; $big_list[$child_id]{results} = $ret_data; } ); for( my $itemNum=0; $itemNum<scalar @big_list; $itemNum++ ) { $fork_mgr->start($itemNum) and next URL # Code in this block will run in parallel my $thisQuery = $big_list[$itemNum]; my $result = do_query($thisQuery{query}); if( $result->ran_OK() ) { $thisQuery{results} = $result->get_results(); } # Store the final result. The value you pass to finish, # and the data structure reference will be # received by the sub you defined in run_on_finish $fork_mgr->finish( $result->ran_OK(), $thisQuery{results} ); } $fork_mgr->wait_all_children(); # Now all child threads have finished, # your results should be available.
Re: Background forked processes from a function.
by JavaFan (Canon) on Nov 30, 2010 at 23:02 UTC
    You are forking, you are doing your query in the child. The child then exits. The wait is never reached. It's in the code path of the child, but after the exit. The parent will never wait.

    I don't see where you parent is waiting. In fact, after the fork, the parent does nothing; at least, not in the code you are showing.

    What you want is a loop that forks of all the children; then when that is done, you wait for your children to finish. Or, if you don't really care about their exit status, you can set $SIG{CHLD} to ignore, or just have the parent exit, and let init reap the children.