Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: How to find all open STDERR and STDOUT dups?

by tlm (Prior)
on Mar 31, 2009 at 20:06 UTC ( #754519=note: print w/ replies, xml ) Need Help??


in reply to How to find all open STDERR and STDOUT dups?

I see from the replies I've gotten so far that I did not explain the problem well enough, so here's a second attempt. The application is a CGI script that is meant to perform a lengthy calculation. In its normal operation, when first accessed, it forks a child (call it C) that will perform the calculation and cache the result. The parent (call it P) just returns a short response that includes a job id, and exits immediately. In pseudo-perl, the logic looks like this:

my $job_id = new_job_id(); $SIG{ CHLD } = 'IGNORE'; die "fork failed: $!" unless defined( my $pid = fork ); if ( $pid ) { # the parent branch (process P) print_response( $job_id ); } else { # the child branch (process C) # NEED: close_all_dups_to_stdout_and_stderr(); close STDOUT; close STDERR; compute_and_cache_results( $job_id ); } exit;

Upon receiving the initial response, the client can then use the included job id to periodically check with the server for information on the job's percent completion, and eventually to retrieve the finished results. This allows the client to provide some feedback to the user.

I noticed recently that the client was freezing after sending the request, and not displaying any indication of progress. Instead, after some time of apparent inactivity, it would display the finished results all at once.

The immediate reason for this was that the parent (P) was lingering around as a zombie after exiting (with Apache as its parent), which caused the connection to remain alive until the child (C) finished.

After a lot of trial and error, I narrowed the problem down to a few open() statements in Parse::RecDescent. If I comment out these statements, the code once again works fine: P's process terminates immediately after it exits, and the client receives the job id right away, soon enough to be useful.

I want to avoid another lengthy debugging ordeal in the future, if I ever decide to use a module that somehow leads to a similar case of leftover filehandles.

What I need is a way to implement close_all_dups_to_stdout_and_stderr. Without it, the defunct P lingers around as a zombie until C terminates, which defeats the purpose of forking the child in the first place. It is this lingering P that causes the HTTP connection to remain open far too long.

Now, L-R, the docs for fileno do in fact suggest that it would come in handy here, but, to my surprise, it does not work as advertised. Below is the line in the original module's code, followed immediately by two debugging lines that I've added:

open (ERROR, ">&STDERR"); printf STDERR "fileno( STDERR ): %d\n", fileno( STDERR ); printf STDERR "fileno( ERROR ): %d\n", fileno( ERROR );
The output from the last two lines is:
fileno( STDERR ): 2 fileno( ERROR ): 6
I'm not sure how to reconcile this with the docs for fileno.

BTW, if anyone cares to verify all of this, the sticking points are in Parse::RecDescent, v. 1.94, lines 2847, 2865, and 2876.

But even if fileno behaved as advertised, to implement close_all_dups_to_stdout_and_stderr in a general way I need a way to find all the open filehandles, so that I can test them with fileno against STDERR and STDOUT. This is what I'd like to figure out how to do cleanly.

almut, I had the same idea of using lsof, but, here again, the results surprised me. I tried the following (somewhat brutal) experiment:

use Parse::RecDescent; close_all_fds(); sub close_all_fds { my @lsof = `/usr/bin/lsof -p $$ 2>/dev/null`; for my $line ( @lsof ) { my @flds = split ' ', $line; next unless $flds[ 3 ] =~ /^(\d+)/; my $fd = $1; my $fh; unless ( open $fh, '<&=', $fd ) { print "no luck with $line"; next; } # per recommendation of The Perl Cookbook, 2nd Ed, p. 262 print "closing $fd\n"; close $fh or die $!; } print Parse::RecDescent::ERROR "Nya, nya! I'm still open!\n"; } __END__ output: closing 0 closing 1 closing 2 closing 4 closing 5 closing 6 no luck with lsoftest 14113 yt 7r FIFO 0,6 891817 +71 pipe Nya, nya! I'm still open!

Bottom line: the problematic handle remains open even after this. And so does STDOUT, for that matter. I'm still scratching my head about this one as well. Cluebricks welcome.

BTW, moving the loading of PRD to after the fork did not help (and would be a very inconvenient solution in any case).

I hope this clarifies the situation.

the lowliest monk


Comment on Re: How to find all open STDERR and STDOUT dups?
Select or Download Code
Re^2: How to find all open STDERR and STDOUT dups?
by almut (Canon) on Mar 31, 2009 at 20:56 UTC

    It's not necessarily the file descriptor number that matters, but rather the OS-internal info (data structure) it refers to. If you dup(2), (">&" in Perl), you get another file descriptor (new number) holding the same info.

    Consider the following simple sample CGI, which hangs (for essentially the same reason that you've described):

    #!/usr/bin/perl if (my $pid = fork) { print "Content-Type: text/plain\n\n"; print `lsof -p $$ -a -d 0-20`; # for parent print `lsof -p $pid -a -d 0-20`; # for child } elsif (defined $pid) { # this creates a dup of file descriptor 2, as descriptor 3 open STDERR2, ">&STDERR" or die "Can't dup STDERR: $!"; close STDOUT; close STDERR; # this makes the parent process hang for 5 sec, because # Apache waits for the pipes associated with stdout/stderr # to be closed CGI-side sleep 5; exit; } else { die "Couldn't fork: $!"; }

    The output you get is something like

    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME hang.pl 25839 apache 0r FIFO 0,6 429306451 pipe hang.pl 25839 apache 1w FIFO 0,6 429306452 pipe hang.pl 25839 apache 2w FIFO 0,6 429306453 pipe hang.pl 25839 apache 3r FIFO 0,6 429306456 pipe hang.pl 25839 apache 9w FIFO 0,6 428685687 pipe COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME hang.pl 25840 apache 0r FIFO 0,6 429306451 pipe hang.pl 25840 apache 3w FIFO 0,6 429306453 pipe hang.pl 25840 apache 9w FIFO 0,6 428685687 pipe

    As you can see in the NODE column, the unclosed (dup'ed) FD 3 in the child (the second lsof output) is the same node (i.e. 429306453) as FD 2 (stderr) in the parent. This is why Apache is still waiting, despite FD 1/2 already having been closed.

Re^2: How to find all open STDERR and STDOUT dups?
by tlm (Prior) on Mar 31, 2009 at 21:24 UTC

    OK, I figured out why my original lsof experiment failed. I was using the recipe from the Perl Cookbook incorrectly. The following works as expected:

    use warnings FATAL => 'all'; no warnings 'once'; use strict; use Parse::RecDescent; close_all_fds(); sub close_all_fds { my @lsof = `/usr/bin/lsof -p $$ 2>/dev/null`; for my $line ( @lsof ) { my @flds = split ' ', $line; next unless $flds[ 3 ] =~ /^(\d+)/; my $fd = $1; my $fh; print "closing $fd\n"; closefd( $fd ); } printf Parse::RecDescent::ERROR "Nya, nya! I'm still open! (BTW, I'm fileno %d)\n", fileno( Parse::RecDescent::ERROR ); } use Inline C => <<EOC; #include <unistd.h> void closefd( int fd ) { if ( close( fd ) ) Perl_croak( aTHX_ "closefd( %d ) failed", fd ); return; } EOC __END__

    I still need to figure out how to selectively close those handles that correspond to STDERR and STDOUT, but with the clues++ I got from almut, I think I should be able to do it.

    the lowliest monk

      I still need to figure out how to selectively close those handles that correspond to STDERR and STDOUT,

      Would it not be sufficient to simple close all open files in your child?

      use POSIX (); ... } else { ## child POSIX::close( $_ ) for 0 .. 255; ## Or higher if that's a possibil +ity ''' }

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://754519]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (11)
As of 2014-09-19 10:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (135 votes), past polls