Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

STDOUT redirects and IPC::Open3

by kennethk (Abbot)
on Oct 17, 2011 at 22:56 UTC ( [id://932037]=perlquestion: print w/replies, xml ) Need Help??

kennethk has asked for the wisdom of the Perl Monks concerning the following question:

So I've been messing around with this a bit too much over the last few days, and I'd like to understand why this is failing. Corion attempted to provide some insight over CB a few days ago, but his insight did not translate into much insight for me. When I execute the following:
#!/usr/bin/perl -w use strict; use IPC::Open3; use Symbol 'gensym'; my @queue = ("Line 1\nThis is the first line\n", "Line 2\nThis is the second line\n", "Line 3\nThis is the third line\n", ); single_thread(@queue); sub single_thread { open my $oldout, ">&", \*STDOUT or die "Can't dup STDOUT: $!"; for (@_) { local *STDOUT; open STDOUT, '>', \my $stdout or die "Can't redirect STDOUT: $ +!"; test($_); print $oldout "STDOUT: $stdout\n" if defined $stdout; close STDOUT; } } sub test { my $line = shift; my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v'); print $wtr $line; close $wtr; my $content=do{local $/;<$rdr>}; local $\; print STDOUT $content; waitpid( $pid, 0 ); }
I get the output
Line 1 This is the first line Line 2 This is the second line Line 3 This is the third line
This means that when I do the redirect, it mucks up the capture for some reason. If I change sub test to
sub test_mod { my $line = shift; my $content = `echo '$line'`; local $\; print STDOUT $content; }
I get the expected
STDOUT: Line 1 This is the first line STDOUT: Line 2 This is the second line STDOUT: Line 3 This is the third line
and as well if I swap sub single_thread to
sub single_thread { for my $line (@_) { my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v'); print $wtr $line; close $wtr; my $content=do{local $/;<$rdr>}; local $\; print "STDOUT: $content\n"; } }
I get
STDOUT: Line 1 This is the first line STDOUT: Line 2 This is the second line STDOUT: Line 3 This is the third line
Any suggestions, other than don't do that? As the subroutine name implies, this is ultimately intended for a multithreaded environment. Essentially, I would like to wrap a subroutine that runs an external command so that it transparently supports threads while maintaining contiguity between each thread's output. I've had this working with backticks for a while, but now I have to modify it to separate the streams and I just don't understand why a core module would show such strange behavior.

Replies are listed 'Best First'.
Re: STDOUT redirects and IPC::Open3
by Eliya (Vicar) on Oct 18, 2011 at 15:26 UTC

    AFAICT, IPC::Open3 is buggy in that it doesn't (under the circumstance outlined below) properly wire the child's ends of the pipes to the standard file descriptors 0-2.

    After having set up the pipes, it does (in the child process, after the fork):

    xclose $dad_rdr; xopen \*STDOUT, ">&=" . fileno $kid_wtr;

    where $dad_rdr is the parent's, and $kid_wtr is the child's end of the respective pipe (and xopen/xclose are just error-handling wrappers around the normal open/close builtins).

    The problem with this code is that it doesn't achieve to "connect" (dup) file descriptor 1 to the pipe, unless STDOUT already is associated with file descriptor 1. If you look at fileno(STDOUT) immediately before the exec (somewhat further down in the module's code), you'll see that in your test case it isn't 1, as it's supposed to, but rather fileno($kid_wtr) — i.e. 9, for example.  Now, no exec'ed normal child program, such as cat, is going to send its standard output to file descriptor 9... Rather, it will write to file descriptor 1 as usual — which in your case is the one inherited from the parent, which would typically still be connected to the terminal.

    In other words, the problem is the "&=" in the above open statement, because that makes (as documented) STDOUT the same file descriptor as what is specified after the &=.

    You may wonder why the module works under some (or most) circumstances. Reason is that the behavior of &= is special if STDOUT already is associated with file descriptor 1. In that case, open STDOUT, ">&=9" does not make STDOUT's file descriptor become 9, rather it'll still be 1. Due to this (undocumented, AFAICT) peculiarity, the child's end of the pipe is wired correctly under most circumstances, and things work as expected...

    In your case, however, you've redirectd STDOUT to the variable $stdout, so it's no longer associated with file descriptor 1 (actually, in this particular case, it's no longer associated with any file descriptor at all, because printing to a string happens Perl internal, without any system file descriptors involved (fileno() reports -1 here)). For this reason, open STDOUT, ">&=9" works as documented, so the file descriptor behind STDOUT (where the exec'ed child should write to) actually does become 9, which cat of course knows nothing about.

    The fix would be to make sure the child's end of the pipe is actually wired to where stdout would normally go, i.e. file descriptor 1 (and the same holds for stdin and stderr respectively, of course).

    So, according to some testing, I would suggest to replace IPC::Open3's

    xclose $dad_rdr; xopen \*STDOUT, ">&=" . fileno $kid_wtr;

    with

    xclose $dad_rdr; open STDOUT, ">&=1"; xopen \*STDOUT, ">&" . fileno $kid_wtr;

    Note that the "=" has been removed from the second open statement. Also note that open STDOUT, ">&=1" explicitly ignores errors, which would occur if there is no valid file descriptor 1. In this case, however, the dup which is behind the subsequent open statement should pick 1 anyway (because it's the lowest available), unless you've closed file descriptor 0 as well (in which case you have a more serious problem...).

    This should handle the cases where the original (parent's) STDOUT file descriptor is still the default (1), or when it's been redirected (i.e. > 1, or none/-1 for memory handles). Also, it shouldn't matter whether there still is a valid file descriptor 1 (accessible via $oldout in your case), or whether it's been closed.

    I've tested the fix with 5.12.3 on Linux, and it sems to work fine.  Suggestions for improvements welcome.

      Thank you. This is outside my expertise, so I appreciate the detailed response. Your patch seems to work, though I haven't run it against the test suite. I've also added a parallel line for STDERR, replacing
      xclose $dad_err; xopen \*STDERR, ">&=" . fileno $kid_err;
      with
      xclose $dad_err; open STDERR, ">&=2"; xopen \*STDERR, ">&=" . fileno $kid_err;
      which seems to behave for STDERR redirects. Now I have the moral quandry of using local modules, or using the distributed version with a work around.
      open STDOUT, ">&=1"; xopen \*STDOUT, ">&" . fileno $kid_wtr;

      That is very buggy also:

      File descriptor 1 may be closed, which would cause open STDOUT, ">&=1" to fail (not uncommon, for instance, mod_perl2 does that).

      If file descriptor 1 is not closed and it is not STDOUT, then it is probably attached to some other unrelated file handler, say FOO. The xclose call will affect both STDOUT and FOO as they share the same file descriptor, breaking any code using FOO on the parent process.

      IMO, the right solution would be to change the system 1, $cmd hack to attach to the child process whatever file handlers are at STDIN, STDOUT and STDERR (or NUL: when closed) irrespectively of their file descriptor numbers. I think that can be done on Windows using the CreateProcess function passing the handles inside the STARTUPINFO structure argument.

        File descriptor 1 may be closed, which would cause open STDOUT, ">&=1" to fail

        I'm aware of that — which is why I mentioned that you shouldn't check for errors here, but let the call just fail silently. I also mentioned that if file descriptor 1 is closed, the dup behind the subsequent open will pick the then free file descriptor 1 anyway, because it's the lowest available (this is the way dup works — and this is also why you need "&" and not "&=" in that open statement).

        The idea behind the open STDOUT, ">&=1" statement is simply to make sure STDOUT is associated with file descriptor 1 (to trigger the "special" behavior of open I mentioned, which results in dup'ing the descriptor of the child's side of the pipe to descriptor 1).  This will happen either way, when the call succeeds or when it fails.

        If file descriptor 1 is not closed and it is not STDOUT, then it is probably attached to some other unrelated file handler, say FOO. The xclose call will affect both STDOUT and FOO as they share the same file descriptor, breaking any code using FOO on the parent process.

        Not sure what xclose you're referring to, and why you're worried about breaking a file descriptor in the parent.  Closing a file descriptor in the child does not render the parent's descriptor dysfunctional (actually, it's a pretty common and healthy practice to close unneeded dups of file descriptors after a fork).

        Try this and you'll see what I mean:

        #!/usr/bin/perl -w use strict; close STDOUT; open FOO, ">", "/dev/tty" or die $!; printf STDERR "fileno(FOO): %d\n", fileno(FOO); open STDOUT, ">", "dummyfile" or die $!; pipe my $rdr, my $wtr; printf STDERR "fileno(pipe-r): %d\n", fileno($rdr); printf STDERR "fileno(pipe-w): %d\n", fileno($wtr); if (fork) { close $wtr; my $r = <$rdr>; chomp($r); print STDERR "r = <<$r>>\n"; print FOO "FOO still working\n"; } else { # child close $rdr; printf STDERR "[child] fileno(STDOUT) initially: %d\n", fileno(STD +OUT); # comment this line out (and edit "&=" below), and you'll see echo + will no longer write to the pipe open STDOUT, ">&=1"; printf STDERR "[child] fileno(STDOUT) after +&=1: %d\n", fileno(STDOUT); open STDOUT, ">&".fileno($wtr) or die $!; printf STDERR "[child] fileno(STDOUT) finally: %d\n", fileno(STD +OUT); exec "/bin/echo", "foobar"; } __END__ fileno(FOO): 1 fileno(pipe-r): 4 fileno(pipe-w): 6 [child] fileno(STDOUT) initially: 3 [child] fileno(STDOUT) after &=1: 1 [child] fileno(STDOUT) finally: 1 r = <<foobar>> FOO still working

        The general issue is that the child's side of the pipe must be accessible via file descriptor 1 before the exec, otherwise no normal exec'ed program (echo here) will send its standard output to it.

        I've strace'd the system calls Perl issues under the hood in the various cases, and I can't see any problem with what's happening due to the extra open STDOUT, ">&=1" statement.

        (Note that I'm addressing the Unix side of the issue only.)

      This fails under ActiveState perl 5.10.1, MSWin32-x86-multi-thread. The if (!DO_SPAWN) { loop in _open3 takes the alternate branch.

        Yes, that's what I would've expected.  Windows doesn't use fork/exec to run subprocesses, so the module takes a different branch here, presumably more adequate than relying on Perl's fork emulation.

        Alas, pipes and processes on Windows, and dup'ing file descriptors, etc. in particular, are not my field of expertise, so I hope someone else will debug the Windows side of the issue...

Re: STDOUT redirects and IPC::Open3
by ikegami (Patriarch) on Oct 18, 2011 at 06:31 UTC
    The child cannot possibly write to the variable in the parent process. This child has no access to the parent's memory space even if it had any understanding of Perl variables. That's what open3 normally creates pipes for you.
      I don't follow how this is relevant to the question at hand. open3 sets up pipes to the child process, and those pipes are read in the test sub. test in turn prints to the parent STDOUT, which has been redirected to a scalar. Why would a redirect in the parent have any impact on the child's I/O?

      Update: Mucking about, I discovered this seems to be associated with localizing the file handle.

      #!/usr/bin/perl -w use strict; use IPC::Open3; use Symbol 'gensym'; my @queue = ("Line 1\nThis is the first line\n", "Line 2\nThis is the second line\n", "Line 3\nThis is the third line\n", ); for my $line (@queue) { local *STDOUT; open LOG, '>', \my $stdout or die "Can't catch LOG: $!"; my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v'); print $wtr $line; close $wtr; my $content=do{local $/;<$rdr>}; waitpid( $pid, 0 ); close LOG; }
      leaks the output, but commenting line 13 causes the capture to operate properly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://932037]
Approved by toolic
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-04-19 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found