http://www.perlmonks.org?node_id=574085

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I've a process (Test::Builder, if you must know), which sends data to both STDERR and STDOUT. However, I need to read both and here are the requirements:

  1. This must be a non-blocking read.
  2. The streams must be read in synch.
  3. It must be cross-platform.
  4. It must run on the oldest version of Perl 5 possible.
  5. It must be pure Perl.

Number 2 seems to require that I use a construct like `$someprocess 2>&1`, but that blocks and I don't know if it's portable. I do know that open FH, "$someprocess 2>&1 |" is blowing up on Windows with a 'bad file descriptor' error.

As far as I can tell, the only way to reliably solve this problem is some way of telling the source process to send everything to the same filehandle. Is there some other way of handling this without changing the source process?

Update: And I cannot use any non-core modules. There's a hope, however faint, that this work might eventually make it's way into the core, hence this and some of the above requirements.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: Synchronizing STDERR and STDOUT
by nothingmuch (Priest) on Sep 21, 2006 at 11:27 UTC
    FWIW, why is TAPx::Parser wrapping around the stream? Why isn't a plumbing loop pulling from the stream and pushing to the parser?

    Then you don't necessarily need nonblocking reads, etc at the IO level - you could push that down to POE or whatever.

    XML::Parser has a non blocking interface which i've always liked due to it's simplicity - you just push strings when you have them, and it generates events. If you put in a partial string then the parser's state machine will simply be in that state waiting for more input.

    This way you can have e.g. TAPx::Parser::Harness::Win32, *Socket, *POE, *Whatever, all reusing the parser without needing to model an iterator API around the various platform specific quirks.

    Update: to clarify that last part - you only truely need non blocking IO if you need to parse multiple streams simultaneously, and as long as the parser has a push api flexible enough to be reentrant (multiple parsers instantiated and with their own state simultaneously) then there's no reason why it can't deliver callbacks in only when it's ready.

    Update 2: POE::Filter::XML is written over XML::SAX::Expat::Incremental which is basically a SAX wrapper for the ExpatNB interface. That might be a nice example.

    -nuffin
    zz zZ Z Z #!perl

      A couple of comments: this is for TAPx::Parser and one of the requirements is to have nothing which prevents it from installing in a fresh Perl install (i.e., no external dependencies), that means POE and friends are out of the question. Even if I make them optional, that doesn't solve my root problem :)

      As for the plumbing loop, while I do like that idea, this thing is rapidly becoming far more complex than desired. Too many layers of indirection/abstraction are going to make this unmanageable. Already I have one major design flaw because of this problem and it's slowing down development. I might consider this option if I have no choice, but for now, I just want to pull from a stream. However, comments that others are making are rapidly convincing me that the synchronization issue can't be solved downstream. I do, however, have ideas on how to solve that little nit.

      Cheers,
      Ovid

      New address of my CGI Course.

        I was suggesting you remove a layer of abstraction - instead of keeping the stream loop inside of TAPx::Parser, let it remain outside so that it doesn't have to worry at all about blocking and synchronization and what not.

        The POE part is just to demonstrate that this approach (push parsing) works well in more situations than stream based iterators (that is, streams are usable with push parsers, but not vice versa).

        -nuffin
        zz zZ Z Z #!perl
      You get ten points for pimpin' my code ;)
Re: Synchronizing STDERR and STDOUT
by shmem (Chancellor) on Sep 21, 2006 at 10:33 UTC

    What about doing that before running Test::Builder -

    open(STDERR,'>&', STDOUT);

    Is that portable?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

      That's been suggested, but it doesn't work :(

      From perlfaq8:

      Note that you cannot simply open STDERR to be a dup of STDOUT in your Perl program and avoid calling the shell to do the redirection. This doesn't work:

      open(STDERR, ">&STDOUT"); $alloutput = `cmd args`; # stderr still escapes


      This fails because the open() makes STDERR go to where STDOUT was going at the time of the open(). The backticks then make STDOUT go to a string, but don't change STDERR (which still goes to the old STDOUT).

      Cheers,
      Ovid

      New address of my CGI Course.

        That's one specialty for backticks only. With backticks, a new filehandle is allocated into which the STDOUT of the subprocess is diverted. But the STDERR of the subshell goes to your STDOUT.

        Yes, the redirect has to be done in the source process, unless you patch your kernel with a MacFilehandle patch (three button -> one button :-) which lumps STDOUT and STDERR together at will.

        Within the same perl process filehandles it's all fine:

        #!/usr/bin/perl -w use strict; # $Id: blorfl.pl,v 0.0 2006/09/21 11:11:11 shmem Exp $ print "foo"; warn "warn"; print "\n"; __END__
        qwurx [shmem] ~> perl -e 'open(STDERR,">&", STDOUT); do "blorfl.pl"' 1 +>/dev/null qwurx [shmem] ~> perl -e 'open(STDERR,">&", STDOUT); do "blorfl.pl"' 2 +>/dev/null foo warn at blorfl.pl line 5.

        But a subprocess invoked has two brand new filehandles for STDOUT and STDERR, which happen to be connected to the same filehandle in the parent (which the subshell doesn't know), but the process is free to buffer at lib. You have to do something with the source process, at least to have it make STDOUT unbuffered if you want the two streams in synch.

        qwurx [shmem] ~> perl -le 'open(STDERR,">&", STDOUT); system "perl blo +rfl.pl"' 1>/dev/null qwurx [shmem] ~> perl -le 'open(STDERR,">&", STDOUT); system "perl blo +rfl.pl"' 2>/dev/null warn at blorfl.pl line 5. foo

        While redirection works as expected, note the reverse order of 'warn' and 'foo' due to buffered STDOUT.

        <update>
        BTW, the FAQ entry you quoted should read like this for clarity

        This fails because the open() makes STDERR go to where STDOUT was going at the time of the open(). The backticks then make the subshell's STDOUT go to a string, but don't change the subshell's STDERR (which still goes to the old STDOUT).
        <update>

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        You can split the fork and the exec up if open mashes them together too much.
        pipe CHILDREAD, CHILWRITE; defined( my $pid = fork ) or die "fork: $!"; if ( $pid ) { # read on CHILDREAD; } else { open STDERR, ">&CHILDWRITE"; open STDOUT, ">&CHILDWRITE"; exec( $somecmd ); }
        This is precisely the type of plumbing that a shell will do when you say 2>&1, except without the unportable syntax ;-)

        That said, IPC::Run and friends already abstract all of this out, so there's no need to reinvent the wheel.

        -nuffin
        zz zZ Z Z #!perl
        This fails because the open() makes STDERR go to where STDOUT was going at the time of the open(). The backticks then make STDOUT go to a string, but don't change STDERR (which still goes to the old STDOUT).

        So don't use backticks. Redirect STDOUT to a file, redirect STDERR to STDOUT and use system() instead.

        use strict; use warnings; use File::Temp; my $temp_stdout = File::Temp->new; local *OLDOUT; local *OLDERR; open( OLDOUT, ">&STDOUT" ); open( OLDERR, ">&STDERR" ); open( STDOUT, ">$temp_stdout" ); open( STDERR, ">&STDOUT" ); # Funky quoting for Windows. Sigh. system('perl -e "print q{to stdout}; warn q{to stderr}; print q{more t +o stdout}'); close(STDOUT); open(STDOUT, ">&OLDOUT"); open(STDERR, ">&OLDERR"); open CAPTURED, "<$temp_stdout"; my $capture = do { local $/; <CAPTURED> }; close CAPTURED; print "Got this:\n$capture";

        That still doesn't solve the problem of keeping them in sync because the subprocess still has two buffered handles. The fact that they go to the same place doesn't matter. You need to get the child process to turn off buffering.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: Synchronizing STDERR and STDOUT
by coreolyn (Parson) on Sep 21, 2006 at 13:56 UTC

    While there's much that could be cleaned up, and in spite of my initial gut feeling in Open3 and bad gut feeling, This open 3 module has provided me what you seek in several large enterprise applications without any problems.

    Here's the code as it's been running in production without change for the last 4 years

    Edited by planetscape - added readmore tags

      coreolyn, sweet stuff. I don't fully understand it yet, but the "Suffering from Buffering" article is proving to be a nice read. Does your program keep the stdout and stderr output in such a way that you can reassemble it in the order it actually was generated?

      Terrence

      _________________________________________________________________________________

      I like computer programming because it's like Legos for the mind.

        I use two separate logs for stderr and stdout. To be completely honest I haven't had a situation where I've needed to make sure they are completely in sync. So maybe I was overly smug in my assumption. You should read the comments that are included in the full node of Open3 and bad gut feeling apparently there's is a lot of extraneous ( read useless code ). I never was given an opportunity, nor did the need arise to refactor it.

Re: Synchronizing STDERR and STDOUT
by monarch (Priest) on Sep 21, 2006 at 11:10 UTC
    What version of Windows are you running on? On Windows XP Professional I can do the following:which outputs the following:

      There's no particular version of Windows. This is for TAPx::Parser which, hopefully, will run anywhere that Perl can run. Thus, I can't rely on any particular version of Windows. Hence my need to keep this as portable as possible.

      Cheers,
      Ovid

      New address of my CGI Course.

Re: Synchronizing STDERR and STDOUT
by OfficeLinebacker (Chaplain) on Sep 21, 2006 at 12:30 UTC
    Greetings, esteemed monks!

    Isn't this one of those currently unsolvable problems? I've been trying to separately but synchronously handle

    STDOUT and STDERR for months now. The closest I've gotten is (big program follows)

Re: Synchronizing STDERR and STDOUT
by ikegami (Patriarch) on Sep 21, 2006 at 15:39 UTC
    some way of telling the source process to send everything to the same filehandle

    open(STDERR,'>&', STDOUT); creates a new filehandle (so it doesn't help), but *STDERR = *STDOUT; makes both STDOUT and STDERR refer to the same filehandle. You don't even need to turn off buffering.

    use IO::Handle (); open(STDERR,'>&', STDOUT); print("STDOUT = ", fileno(STDOUT), "\n"); # 1 print("STDERR = ", fileno(STDERR), "\n"); # 2 print STDOUT 'a'; print STDERR 'b'; print STDOUT 'c'; STDOUT->flush(); # ac STDERR->flush(); # b print("\n"); *STDERR = *STDOUT; print("STDOUT = ", fileno(STDOUT), "\n"); # 1 print("STDERR = ", fileno(STDERR), "\n"); # 1 print STDOUT 'a'; print STDERR 'b'; print STDOUT 'c'; STDOUT->flush(); # abc STDERR->flush();

    Of course, this doesn't work if you fork.

    If you can't modify the program, you can replace
    perl script.pl
    with
    perl -e "*STDERR = *STDOUT; do 'script.pl'"