http://www.perlmonks.org?node_id=222250

BazB has asked for the wisdom of the Perl Monks concerning the following question:

Salutations Fellow Monks,

I'm reasonably familiar with IPC::Open2, IPC::Open3, piped open()s and the like, however I've not yet figured out a nice way to pipe several commands together, in a similar manner to the following in ksh/bash: cmd1 arg1 arg2 2>cmd1.log | cmd2 arg1 2>cmd2.log | cmd3 2>cmd3.log $? checking excluded for clarity

Currently, I use an IPC::Open3 based wrapper for each command, and just do the following:

execute_cmd(cmd1, @args, $in_fh1, $out_fh1, $err_fh1); execute_cmd(cmd2, @args, $in_fh2, $out_fh2, $err_fh2); while (<$out_fh1>) { print $in_fh2 $_; }
IMHO, that's a horrid way of getting the output from one command into another, and I've not felt the urge to use it with more than two commands at once.

So, can anyone suggest a nice, robust and preferably quick way of piping several commands together?

Cheers. BazB

Replies are listed 'Best First'.
Re: Robustly piping several processes.
by IlyaM (Parson) on Dec 26, 2002 at 11:48 UTC
Re: Robustly piping several processes.
by pg (Canon) on Dec 25, 2002 at 23:26 UTC
    Hope I understand what you said. I did something on windows, as I don't have unix around today, sweet home, but you can easily move this to unix. I tested the following code with:
    type a.pl|perl -w a.pl
    
    and it demonstrates how you can make your scripts to follow the behavior of tranditional pipe, and take output from another process as input.
    a.pl: @in = <STDIN>; foreach (@in) { print "from perl script: ", $_; }
    Update 2: I don't know whether you have control towards cmd1, cmd2, etc. or not. If yes, I would strongly suggest you to use pipe(), to help you connect those handlers into pairs.

    Update: Hm...Now I know what you mean, so you want your script to act as the "middle man". Anyway, I still would like to leave my original reply there, as it is still a good answer to "the question I thought it was".

    Your approach ALONE sounds good to me, but that while loop can be simplified, and you may borrow the concept from following code:
    open(DATA, "<", "ex902.pl"); @in = <DATA>;#read in as array close(DATA); open(DATA1, ">", "data.txt"); print DATA1 @in;#flush the whole array out close(DATA1);
    However I am wondering why you are trying to do that. It sounds to me, like you are reinventing something the OS is doing for you. For the project you are working on, is there an alternative solution, which allows you to utilize your OS as much as possible? Well, any way, I believe that you must have a good reason to do this.

      I think you've misunderstood what I'm after.

      I want to execute multiple commands, piped together, but with error checking, STDERR capture, and the like, from within my scripts, hence the comments about the IPC:: modules and piped open()s, not make single scripts behave nicely as part of a pipeline.

      Cheers.

      BazB.

      Update to pg's update: the suggestion to read the output from the first command into an array will not work.
      That data could be upto ~30 million lines or ~50Gb.
      That's the whole point of pipes - you don't have to be able to store the whole dataset in memory, nor waste time writing intermediate stages to disk.

      The potential size of the input is also why I use while loops in my current code, although read() would probably be more efficient, since the data consists of fixed-length records.

      Doing this in the shell directly might be easier, but the benefits of using Perl for building the application are more of an issue.

      Update 2: Ah ha! Me thinks pg(++!) has cracked it.
      pipe() seems to be the way to go. I'm rather surprised that I'd not come across it before.

        BazB and I had some very interesting discussions on and off, in the chat room, and through MessageBox, now we come to an agreement that, the pipe() function introduced in perlipc would be a good solution to connect IO handlers between processes.

        He suggested me to post a reply to complete this thread, and I am doing so now. A piece of sample attached:
        use IO::Handle; use strict; $| ++; pipe(PARENT_READER, CHILD_WRITER); pipe(CHILD_READER, PARENT_WRITER); PARENT_WRITER->autoflush(1); CHILD_WRITER->autoflush(1); my $pid; if ($pid = fork()) { close(CHILD_READER); close(CHILD_WRITER); my $buffer; print PARENT_WRITER 1; while (1) { sysread(PARENT_READER, $buffer, 100); print "parent revd: $buffer, and reply with ", $buffer + 1, "\ +n"; sleep(1); print PARENT_WRITER $buffer + 1; } } else { close(PARENT_READER); close(PARENT_WRITER); my $buffer; while (1) { sysread(CHILD_READER, $buffer, 1000); print "child revd: $buffer, and reply with ", $buffer + 1, "\n +"; sleep(1); print CHILD_WRITER $buffer + 1; } }
        But if you want your script to look through the output while it's between the two processes to check for errors and such, you do need the loop with the read/write. If point each process at one end of a pipe you created, it will go on without your involvement. Maybe you can use the efficient pipe for stdout/stdin but still monitor stderr within the script.

        I wrote a filter to act as a sniffer for a specific network protocol. I wrote code in Win32 to use IO completion ports to efficiently move the buffer from one to the other without copying it, and also passed it to an embedded Perl interpreter as a "tee" in the middle. The Perl could do its fine job of parsing the stuff and presenting it to me, but the processes piped efficiently as long as I didn't select the option for modification of the data stream by the Perl code (that is, report only). I used the modification ability to introduce errors or otherwise test things that my well-behaved program never did.

        —John

Re: Robustly piping several processes.
by MarkM (Curate) on Dec 26, 2002 at 06:54 UTC

    IPC::Open2 and IPC::Open3 are based on pipe(). Under UNIX, the shell command "a | b | c" will follow the same principal. (Significance: pipe() is no 'cleaner' than IPC::Open3)

Re: Robustly piping several processes.
by Anonymous Monk on Dec 26, 2002 at 00:42 UTC
    Use "<&".fileno($out_fh1) as $in_fh2. Also open cmd3.log on a filehandle, and pass that in as the err_fh to each command in sequence.

    Untested, but the documentation indicates that that should work.