http://www.perlmonks.org?node_id=471400

mda2 has asked for the wisdom of the Perl Monks concerning the following question:

Update
  • IO reduces is a needs for content-filter on smtp servers
  • Postfix don't use redirection on call after-queue
  • It's work with redirection, but don't with my needs :(
    I'm working on a content filter on Postfix, and need exec antivirus, anti spam and usualy write content to debug, but it's a heavy host and need to reduce disk io.

    On my tests seek and tell functions working on STDIN, when exec with redirect, but a question "is it correct or can fail on future" ?

    My relevant code:

    filter.pl < filter.pl

    print "." while ( <STDIN> ); print "\n"; seek(STDIN,0,0); print "," while ( <STDIN> ); print "\n";
    Code more complex:

    My code need this reduce io...

    use strict; my ( $dfrom, $dsubj ); my $from = shift; while ( defined( $_ = <STDIN> ) ) { if ( substr($_,0,5) eq 'From:' ) { chomp($dfrom = substr($_,6);) last; } } if ( $dfrom && $from ne $dfrom ) { seek(STDIN,0,0); ... save copy and log ... } seek(STDIN,0,0); ... Clamav call by stream ... if ( ... virus found ... ) { ... warn mail ... exit; } seek(STDIN,0,0); open(AS, "| ... anti spam binary") or die " ... error string "; syswrite(AS, $buf) while ( read(STDIN, $buf, 32768) ); close(AS); }

    It don't work if called with cat filter.pl | ./filter.pl or input from keyboard.

    --
    Marco Antonio
    Rio-PM

  • Replies are listed 'Best First'.
    Re: Recirect data and Filehandle manipulation (STDIN x Disk IO)
    by Tanktalus (Canon) on Jun 30, 2005 at 21:18 UTC

      You can't seek on a pipe. You may be able to seek (though I've not tried) on a redirection, e.g., "./filter.pl < filter.pl". But that's not what you want.

      What you want to do is read everything into memory, and spit it out to the multiple destinations. Well, not quite - some emails can get quite large.

      Closer is to buffer the header. Once you've determined if you're doing to do something with it based on the $dfrom, then you can loop through the header before going through stdin for data on your next time through the email.

      More realistically, copy the whole thing to a temp file, and use that. You can seek all you want on a temp file.

        I really need to work without write temp file, to improve performance on postfix smtp servers.

        After your response I found references for "< redirection" ("opens the file for input, default unit 0"), and confirm with lsof command:

        script < input & # lsof -p PID COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME ... cut ... perl 20316 root 0r REG 8,2 43151 1564123 /root/av/mail.e +xploits perl 20316 root 1u CHR 136,2 4 /dev/pts/2 perl 20316 root 2u CHR 136,2 4 /dev/pts/2

        Thank's for your response!

        --
        Marco Antonio
        Rio-PM

          To be honest, I would still use a temp file. And then, because this is perl, you could change it from an IO::File to an IO::Scalar later and compare if you're really saving any clock time. I'm betting it's not going to be significant on most modern hardware and OS. HD speeds continue to improve, as do the caching algorithms of the OS.

          This smells of premature optimisation to me.