Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

How to find all open STDERR and STDOUT dups?

by tlm (Prior)
on Mar 31, 2009 at 17:55 UTC ( #754484=perlquestion: print w/ replies, xml ) Need Help??
tlm has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone. I just spent a very long time tracking down a bug that boiled down to some lingering open file handles, all duplicates of STDERR, that were causing an HTTP connection to remain open longer than it should have.

These "rogue" filehandles had been opened by a module that my code did not load explicitly. In fact, I was not even aware that my code was using this module at all, since it comes in at the end of a long dependency chain...

To avoid this sort of bug in the future, I'm looking for a way for my code to find all the open non-lexical file handles that are duplicates of either STDOUT or STDERR.

Before I go off on a clueless hacking of the symbol table, or some other equally harebrained scheme, I thought I'd ask here if anyone knew of a good way to do what I'm trying to do.

Many thanks in advance!

the lowliest monk

Comment on How to find all open STDERR and STDOUT dups?
Re: How to find all open STDERR and STDOUT dups?
by Limbic~Region (Chancellor) on Mar 31, 2009 at 18:02 UTC
    tlm,
    As I mentioned in /msg, the standard way would be fileno. Are these other filehandles available to you in your main code though? If not, it seems like symbol table hackery may in fact be the way to go.

    Cheers - L~R

      You actually want to stat stuff and compare the resulting device, inode, etc. fileno returns the descriptor, but dup creates a new file descriptor pointing to the same underlying file.
Re: How to find all open STDERR and STDOUT dups?
by almut (Canon) on Mar 31, 2009 at 18:16 UTC

    I would use lsof(8)/linux for such problems, because in case some XS code has done the dup(2), you may not have much luck finding anything related in the symbol table...

      It seems to me that lsof can confirm that something, somewhere, has STDERR open when it shouldn't. But does that really help in tracking down that something, somewhere?

      While symbol table hackery is imperfect, it is a reasonable next step after that point.

        Both approaches are imperfect. But ultimately, I consider the info you can retrieve via lsof more useful than what you might find when digging around in the symbol table. The two main issues with the symbol table are:

        • the file descriptor number in question may not be the one you're looking for (see my other reply below)
        • the dup(2) may not have left any traces in Perl's symbol table at all (as I tried to point out above), in case the dup happened via some C-level code in an XS module.
Re: How to find all open STDERR and STDOUT dups?
by boblawblah (Scribe) on Mar 31, 2009 at 18:17 UTC
    File a bug report with the maintainer of the module. Or better yet, fix the bug and send him (or her) a patch.
Re: How to find all open STDERR and STDOUT dups?
by ikegami (Pope) on Mar 31, 2009 at 19:03 UTC

    I'd be interested in hearing more details.

Re: How to find all open STDERR and STDOUT dups?
by tlm (Prior) on Mar 31, 2009 at 20:06 UTC

    I see from the replies I've gotten so far that I did not explain the problem well enough, so here's a second attempt. The application is a CGI script that is meant to perform a lengthy calculation. In its normal operation, when first accessed, it forks a child (call it C) that will perform the calculation and cache the result. The parent (call it P) just returns a short response that includes a job id, and exits immediately. In pseudo-perl, the logic looks like this:

    my $job_id = new_job_id(); $SIG{ CHLD } = 'IGNORE'; die "fork failed: $!" unless defined( my $pid = fork ); if ( $pid ) { # the parent branch (process P) print_response( $job_id ); } else { # the child branch (process C) # NEED: close_all_dups_to_stdout_and_stderr(); close STDOUT; close STDERR; compute_and_cache_results( $job_id ); } exit;

    Upon receiving the initial response, the client can then use the included job id to periodically check with the server for information on the job's percent completion, and eventually to retrieve the finished results. This allows the client to provide some feedback to the user.

    I noticed recently that the client was freezing after sending the request, and not displaying any indication of progress. Instead, after some time of apparent inactivity, it would display the finished results all at once.

    The immediate reason for this was that the parent (P) was lingering around as a zombie after exiting (with Apache as its parent), which caused the connection to remain alive until the child (C) finished.

    After a lot of trial and error, I narrowed the problem down to a few open() statements in Parse::RecDescent. If I comment out these statements, the code once again works fine: P's process terminates immediately after it exits, and the client receives the job id right away, soon enough to be useful.

    I want to avoid another lengthy debugging ordeal in the future, if I ever decide to use a module that somehow leads to a similar case of leftover filehandles.

    What I need is a way to implement close_all_dups_to_stdout_and_stderr. Without it, the defunct P lingers around as a zombie until C terminates, which defeats the purpose of forking the child in the first place. It is this lingering P that causes the HTTP connection to remain open far too long.

    Now, L-R, the docs for fileno do in fact suggest that it would come in handy here, but, to my surprise, it does not work as advertised. Below is the line in the original module's code, followed immediately by two debugging lines that I've added:

    open (ERROR, ">&STDERR"); printf STDERR "fileno( STDERR ): %d\n", fileno( STDERR ); printf STDERR "fileno( ERROR ): %d\n", fileno( ERROR );
    The output from the last two lines is:
    fileno( STDERR ): 2 fileno( ERROR ): 6
    I'm not sure how to reconcile this with the docs for fileno.

    BTW, if anyone cares to verify all of this, the sticking points are in Parse::RecDescent, v. 1.94, lines 2847, 2865, and 2876.

    But even if fileno behaved as advertised, to implement close_all_dups_to_stdout_and_stderr in a general way I need a way to find all the open filehandles, so that I can test them with fileno against STDERR and STDOUT. This is what I'd like to figure out how to do cleanly.

    almut, I had the same idea of using lsof, but, here again, the results surprised me. I tried the following (somewhat brutal) experiment:

    use Parse::RecDescent; close_all_fds(); sub close_all_fds { my @lsof = `/usr/bin/lsof -p $$ 2>/dev/null`; for my $line ( @lsof ) { my @flds = split ' ', $line; next unless $flds[ 3 ] =~ /^(\d+)/; my $fd = $1; my $fh; unless ( open $fh, '<&=', $fd ) { print "no luck with $line"; next; } # per recommendation of The Perl Cookbook, 2nd Ed, p. 262 print "closing $fd\n"; close $fh or die $!; } print Parse::RecDescent::ERROR "Nya, nya! I'm still open!\n"; } __END__ output: closing 0 closing 1 closing 2 closing 4 closing 5 closing 6 no luck with lsoftest 14113 yt 7r FIFO 0,6 891817 +71 pipe Nya, nya! I'm still open!

    Bottom line: the problematic handle remains open even after this. And so does STDOUT, for that matter. I'm still scratching my head about this one as well. Cluebricks welcome.

    BTW, moving the loading of PRD to after the fork did not help (and would be a very inconvenient solution in any case).

    I hope this clarifies the situation.

    the lowliest monk

      It's not necessarily the file descriptor number that matters, but rather the OS-internal info (data structure) it refers to. If you dup(2), (">&" in Perl), you get another file descriptor (new number) holding the same info.

      Consider the following simple sample CGI, which hangs (for essentially the same reason that you've described):

      #!/usr/bin/perl if (my $pid = fork) { print "Content-Type: text/plain\n\n"; print `lsof -p $$ -a -d 0-20`; # for parent print `lsof -p $pid -a -d 0-20`; # for child } elsif (defined $pid) { # this creates a dup of file descriptor 2, as descriptor 3 open STDERR2, ">&STDERR" or die "Can't dup STDERR: $!"; close STDOUT; close STDERR; # this makes the parent process hang for 5 sec, because # Apache waits for the pipes associated with stdout/stderr # to be closed CGI-side sleep 5; exit; } else { die "Couldn't fork: $!"; }

      The output you get is something like

      COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME hang.pl 25839 apache 0r FIFO 0,6 429306451 pipe hang.pl 25839 apache 1w FIFO 0,6 429306452 pipe hang.pl 25839 apache 2w FIFO 0,6 429306453 pipe hang.pl 25839 apache 3r FIFO 0,6 429306456 pipe hang.pl 25839 apache 9w FIFO 0,6 428685687 pipe COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME hang.pl 25840 apache 0r FIFO 0,6 429306451 pipe hang.pl 25840 apache 3w FIFO 0,6 429306453 pipe hang.pl 25840 apache 9w FIFO 0,6 428685687 pipe

      As you can see in the NODE column, the unclosed (dup'ed) FD 3 in the child (the second lsof output) is the same node (i.e. 429306453) as FD 2 (stderr) in the parent. This is why Apache is still waiting, despite FD 1/2 already having been closed.

      OK, I figured out why my original lsof experiment failed. I was using the recipe from the Perl Cookbook incorrectly. The following works as expected:

      use warnings FATAL => 'all'; no warnings 'once'; use strict; use Parse::RecDescent; close_all_fds(); sub close_all_fds { my @lsof = `/usr/bin/lsof -p $$ 2>/dev/null`; for my $line ( @lsof ) { my @flds = split ' ', $line; next unless $flds[ 3 ] =~ /^(\d+)/; my $fd = $1; my $fh; print "closing $fd\n"; closefd( $fd ); } printf Parse::RecDescent::ERROR "Nya, nya! I'm still open! (BTW, I'm fileno %d)\n", fileno( Parse::RecDescent::ERROR ); } use Inline C => <<EOC; #include <unistd.h> void closefd( int fd ) { if ( close( fd ) ) Perl_croak( aTHX_ "closefd( %d ) failed", fd ); return; } EOC __END__

      I still need to figure out how to selectively close those handles that correspond to STDERR and STDOUT, but with the clues++ I got from almut, I think I should be able to do it.

      the lowliest monk

        I still need to figure out how to selectively close those handles that correspond to STDERR and STDOUT,

        Would it not be sufficient to simple close all open files in your child?

        use POSIX (); ... } else { ## child POSIX::close( $_ ) for 0 .. 255; ## Or higher if that's a possibil +ity ''' }

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://754484]
Approved by kyle
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2014-10-22 01:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls