STDOUT redirects and IPC::Open3

kennethk has asked for the wisdom of the Perl Monks concerning the following question:

So I've been messing around with this a bit too much over the last few days, and I'd like to understand why this is failing. Corion attempted to provide some insight over CB a few days ago, but his insight did not translate into much insight for me. When I execute the following:

#!/usr/bin/perl -w
use strict;
use IPC::Open3;
use Symbol 'gensym';

my @queue = ("Line 1\nThis is the first line\n",
             "Line 2\nThis is the second line\n",
             "Line 3\nThis is the third line\n",
             );

single_thread(@queue);

sub single_thread {
    open my $oldout, ">&", \*STDOUT or die "Can't dup STDOUT: $!";
    
    for (@_) {
        local *STDOUT;
        open STDOUT, '>', \my $stdout or die "Can't redirect STDOUT: $
+!";

        test($_);

        print $oldout "STDOUT: $stdout\n" if defined $stdout;
        close STDOUT;
    }
}

sub test {
    my $line = shift;

    my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v');
    print $wtr $line;
    close $wtr;

    my $content=do{local $/;<$rdr>};

    local $\;
    print STDOUT $content;
    
    waitpid( $pid, 0 );
}
[download]

I get the output

Line 1
This is the first line
Line 2
This is the second line
Line 3
This is the third line
[download]

This means that when I do the redirect, it mucks up the capture for some reason. If I change sub test to

sub test_mod {
    my $line = shift;

    my $content = `echo '$line'`;

    local $\;
    print STDOUT $content;
}
[download]

I get the expected

STDOUT: Line 1
This is the first line


STDOUT: Line 2
This is the second line


STDOUT: Line 3
This is the third line
[download]

and as well if I swap sub single_thread to

sub single_thread {
    for my $line (@_) {
        my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v');
        print $wtr $line;
        close $wtr;

        my $content=do{local $/;<$rdr>};

        local $\;
        print "STDOUT: $content\n";
    }
}
[download]

I get

STDOUT: Line 1
This is the first line

STDOUT: Line 2
This is the second line

STDOUT: Line 3
This is the third line
[download]

Any suggestions, other than don't do that? As the subroutine name implies, this is ultimately intended for a multithreaded environment. Essentially, I would like to wrap a subroutine that runs an external command so that it transparently supports threads while maintaining contiguity between each thread's output. I've had this working with backticks for a while, but now I have to modify it to separate the streams and I just don't understand why a core module would show such strange behavior.

Comment on STDOUT redirects and IPC::Open3 Select or Download Code

Replies are listed 'Best First'.
Re: STDOUT redirects and IPC::Open3 by Eliya (Vicar) on Oct 18, 2011 at 15:26 UTC
AFAICT, IPC::Open3 is buggy in that it doesn't (under the circumstance outlined below) properly wire the child's ends of the pipes to the standard file descriptors 0-2. After having set up the pipes, it does (in the child process, after the fork): `xclose $dad_rdr; xopen \STDOUT, ">&=" . fileno $kid_wtr;` [download] where `$dad_rdr` is the parent's, and `$kid_wtr` is the child's end of the respective pipe (and `xopen/xclose` are just error-handling wrappers around the normal open/close builtins). The problem with this code is that it doesn't achieve to "connect" (dup) file descriptor 1 to the pipe, unless STDOUT already is associated with file descriptor 1. If you look at `fileno(STDOUT)` immediately before the `exec` (somewhat further down in the module's code), you'll see that in your test case it isn't 1, as it's supposed to, but rather `fileno($kid_wtr)` — i.e. 9, for example. Now, no exec'ed normal child program, such as `cat`, is going to send its standard output to file descriptor 9... Rather, it will write to file descriptor 1 as usual — which in your case is the one inherited from the parent, which would typically still be connected to the terminal. In other words, the problem is the `"&="` in the above open statement, because that makes (as documented) STDOUT the same file descriptor as what is specified after the `&=`. You may wonder why the module works under some (or most) circumstances. Reason is that the behavior of `&=` is special if STDOUT already is associated with file descriptor 1. In that case, `open STDOUT, ">&=9"` does not* make STDOUT's file descriptor become 9, rather it'll still be 1. Due to this (undocumented, AFAICT) peculiarity, the child's end of the pipe is wired correctly under most circumstances, and things work as expected... In your case, however, you've redirectd STDOUT to the variable `$stdout`, so it's no longer associated with file descriptor 1 (actually, in this particular case, it's no longer associated with any file descriptor at all, because printing to a string happens Perl internal, without any system file descriptors involved (`fileno()` reports -1 here)). For this reason, `open STDOUT, ">&=9"` works as documented, so the file descriptor behind STDOUT (where the exec'ed child should write to) actually does become 9, which `cat` of course knows nothing about. The fix would be to make sure the child's end of the pipe is actually wired to where stdout would normally go, i.e. file descriptor 1 (and the same holds for stdin and stderr respectively, of course). So, according to some testing, I would suggest to replace IPC::Open3's `xclose $dad_rdr; xopen \STDOUT, ">&=" . fileno $kid_wtr;` [download] with `xclose $dad_rdr; open STDOUT, ">&=1"; xopen \STDOUT, ">&" . fileno $kid_wtr;` [download] Note that the "=" has been removed from the second open statement. Also note that `open STDOUT, ">&=1"` explicitly ignores errors, which would occur if there is no valid file descriptor 1. In this case, however, the `dup` which is behind the subsequent open statement should pick 1 anyway (because it's the lowest available), unless you've closed file descriptor 0 as well (in which case you have a more serious problem...). This should handle the cases where the original (parent's) STDOUT file descriptor is still the default (1), or when it's been redirected (i.e. > 1, or none/-1 for memory handles). Also, it shouldn't matter whether there still is a valid file descriptor 1 (accessible via `$oldout` in your case), or whether it's been closed. I've tested the fix with 5.12.3 on Linux, and it sems to work fine. Suggestions for improvements welcome.	[reply] [d/l] [select]
Re^2: STDOUT redirects and IPC::Open3 by kennethk (Abbot) on Oct 18, 2011 at 16:23 UTC
Thank you. This is outside my expertise, so I appreciate the detailed response. Your patch seems to work, though I haven't run it against the test suite. I've also added a parallel line for STDERR, replacing `xclose $dad_err; xopen \STDERR, ">&=" . fileno $kid_err;` [download] with `xclose $dad_err; open STDERR, ">&=2"; xopen \STDERR, ">&=" . fileno $kid_err;` [download] which seems to behave for STDERR redirects. Now I have the moral quandry of using local modules, or using the distributed version with a work around.	[reply] [d/l] [select]
Re^2: STDOUT redirects and IPC::Open3 by salva (Canon) on Oct 19, 2011 at 16:53 UTC
`open STDOUT, ">&=1"; xopen \*STDOUT, ">&" . fileno $kid_wtr;` [download] That is very buggy also: File descriptor 1 may be closed, which would cause `open STDOUT, ">&=1"` to fail (not uncommon, for instance, mod_perl2 does that). If file descriptor 1 is not closed and it is not STDOUT, then it is probably attached to some other unrelated file handler, say FOO. The `xclose` call will affect both STDOUT and FOO as they share the same file descriptor, breaking any code using FOO on the parent process. IMO, the right solution would be to change the `system 1, $cmd` hack to attach to the child process whatever file handlers are at STDIN, STDOUT and STDERR (or NUL: when closed) irrespectively of their file descriptor numbers. I think that can be done on Windows using the CreateProcess function passing the handles inside the STARTUPINFO structure argument.	[reply] [d/l] [select]
Re^3: STDOUT redirects and IPC::Open3 by Eliya (Vicar) on Oct 19, 2011 at 22:33 UTC
File descriptor 1 may be closed, which would cause open STDOUT, ">&=1" to fail I'm aware of that — which is why I mentioned that you shouldn't check for errors here, but let the call just fail silently. I also mentioned that if file descriptor 1 is closed, the dup behind the subsequent open will pick the then free file descriptor 1 anyway, because it's the lowest available (this is the way `dup` works — and this is also why you need `"&"` and not `"&="` in that open statement). The idea behind the `open STDOUT, ">&=1"` statement is simply to make sure STDOUT is associated with file descriptor 1 (to trigger the "special" behavior of `open` I mentioned, which results in `dup`'ing the descriptor of the child's side of the pipe to descriptor 1). This will happen either way, when the call succeeds or when it fails. If file descriptor 1 is not closed and it is not STDOUT, then it is probably attached to some other unrelated file handler, say FOO. The xclose call will affect both STDOUT and FOO as they share the same file descriptor, breaking any code using FOO on the parent process. Not sure what xclose you're referring to, and why you're worried about breaking a file descriptor in the parent. Closing a file descriptor in the child does not render the parent's descriptor dysfunctional (actually, it's a pretty common and healthy practice to close unneeded dups of file descriptors after a fork). Try this and you'll see what I mean: #!/usr/bin/perl -w use strict; close STDOUT; open FOO, ">", "/dev/tty" or die $!; printf STDERR "fileno(FOO): %d\n", fileno(FOO); open STDOUT, ">", "dummyfile" or die $!; pipe my $rdr, my $wtr; printf STDERR "fileno(pipe-r): %d\n", fileno($rdr); printf STDERR "fileno(pipe-w): %d\n", fileno($wtr); if (fork) { close $wtr; my $r = <$rdr>; chomp($r); print STDERR "r = <<$r>>\n"; print FOO "FOO still working\n"; } else { # child close $rdr; printf STDERR "[child] fileno(STDOUT) initially: %d\n", fileno(STD +OUT); # comment this line out (and edit "&=" below), and you'll see echo + will no longer write to the pipe open STDOUT, ">&=1"; printf STDERR "[child] fileno(STDOUT) after +&=1: %d\n", fileno(STDOUT); open STDOUT, ">&".fileno($wtr) or die $!; printf STDERR "[child] fileno(STDOUT) finally: %d\n", fileno(STD +OUT); exec "/bin/echo", "foobar"; } __END__ fileno(FOO): 1 fileno(pipe-r): 4 fileno(pipe-w): 6 [child] fileno(STDOUT) initially: 3 [child] fileno(STDOUT) after &=1: 1 [child] fileno(STDOUT) finally: 1 r = <<foobar>> FOO still working [download] The general issue is that the child's side of the pipe must be accessible via file descriptor 1 before the `exec`, otherwise no normal exec'ed program (`echo` here) will send its standard output to it. I've `strace`'d the system calls Perl issues under the hood in the various cases, and I can't see any problem with what's happening due to the extra `open STDOUT, ">&=1"` statement. (Note that I'm addressing the Unix side of the issue only.)	[reply] [d/l] [select]
Re^4: STDOUT redirects and IPC::Open3 by salva (Canon) on Oct 20, 2011 at 08:19 UTC
Re^2: STDOUT redirects and IPC::Open3 by kennethk (Abbot) on Oct 19, 2011 at 15:13 UTC
This fails under ActiveState perl 5.10.1, MSWin32-x86-multi-thread. The `if (!DO_SPAWN) {` loop in `_open3` takes the alternate branch.	[reply] [d/l] [select]
Re^3: STDOUT redirects and IPC::Open3 by Eliya (Vicar) on Oct 19, 2011 at 15:36 UTC
Yes, that's what I would've expected. Windows doesn't use `fork/exec` to run subprocesses, so the module takes a different branch here, presumably more adequate than relying on Perl's fork emulation. Alas, pipes and processes on Windows, and `dup`'ing file descriptors, etc. in particular, are not my field of expertise, so I hope someone else will debug the Windows side of the issue...	[reply] [d/l] [select]
Re: STDOUT redirects and IPC::Open3 by ikegami (Patriarch) on Oct 18, 2011 at 06:31 UTC
The child cannot possibly write to the variable in the parent process. This child has no access to the parent's memory space even if it had any understanding of Perl variables. That's what open3 normally creates pipes for you.	[reply]
Re^2: STDOUT redirects and IPC::Open3 by kennethk (Abbot) on Oct 18, 2011 at 14:47 UTC
I don't follow how this is relevant to the question at hand. open3 sets up pipes to the child process, and those pipes are read in the `test` sub. `test` in turn prints to the parent STDOUT, which has been redirected to a scalar. Why would a redirect in the parent have any impact on the child's I/O? Update: Mucking about, I discovered this seems to be associated with localizing the file handle. `#!/usr/bin/perl -w use strict; use IPC::Open3; use Symbol 'gensym'; my @queue = ("Line 1\nThis is the first line\n", "Line 2\nThis is the second line\n", "Line 3\nThis is the third line\n", ); for my $line (@queue) { local *STDOUT; open LOG, '>', \my $stdout or die "Can't catch LOG: $!"; my $pid = open3(my $wtr, my $rdr, undef, 'cat', '-v'); print $wtr $line; close $wtr; my $content=do{local $/;<$rdr>}; waitpid( $pid, 0 ); close LOG; }` [download] leaks the output, but commenting line 13 causes the capture to operate properly.	[reply] [d/l] [select]


Perl: the Markov chain saw
	PerlMonks