Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

dynamic zcat and grep

by clmcshque (Initiate)
on Mar 21, 2006 at 21:40 UTC ( #538316=perlquestion: print w/replies, xml ) Need Help??

clmcshque has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am trying to parse some compressed logs, but am having no luck. I have a list of things to search for, and a changing list of files (based on dates). I can get the list of files with no problem, and can get the info oon the command line with the following command:
zcat file.gz | grep "Joe Sinclair" > output.txt
Inside my script I am doing what I believe to be the same thing, but I keep getting errors about opening the file. Here is a sample of the code that I am using:
my $filename = "file.gz"; my $reg1 = "Joe Sinclair"; my $reg2 = "Bill Halburg"; open INFILE, 'zcat $filename |' or print "ERROR - Could not open $fil +ename"; while($line = <INFILE>) { print $line if($line =~ /$reg1/); print $line if($line =~ /$reg2/); } close(INFILE);
What am I doing wrong? Thanks

Replies are listed 'Best First'.
Re: dynamic zcat and grep
by swampyankee (Parson) on Mar 21, 2006 at 21:50 UTC

    I believe this line:

    open INFILE, 'zcat $filename |' or print "ERROR - Could not open $fil +ename";
    is running afoul of Perl's quoting convention; single quotes don't interpolate variable values. Try replacing the single quotes with double quotes.


    " The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents."
    —Nathaniel S. Borenstein
      Thanks for the reply. I've tried many different encarnations of that line:
      open(INFILE, "zcat $filename |") open(INFILE, 'zcat $filename |') open INFILE, "zcat $filename |"
      all with the similar results.
      zcat: compressed data not read from a terminal. Use -f to force decomp +ression. For help, type: zcat -h

        Sorry my help was not helpful.

        I've looked at the docs for zcat (well, OpenBSD's docs for zcat) and it seems that your first and third choices should work: zcat unzips the input file to STDOUT. When I get a chance (probably about 24 hours from now), I'll muck about on by OpenBSD box to see if I can replicate your results.


        " The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents."
        —Nathaniel S. Borenstein
        Can you use your zcat in a pipe on the command line? Like

        % cat file.gz | zcat | less

        This works on Solaris but if it doesn't work for you it could be that your zcat demands the presence of a terminal as implied by your results.



      This is an old thread but this answer may prove useful to others, I had a similar problem with a command similar to

      zcat -c test.gz | > test.out

      being called within a makefile, it had the same error as mentioned in this thread. I ended up changing it to

      cat test.gz | zcat -c | > test.out

      Regards Brad

Re: dynamic zcat and grep
by johngg (Canon) on Mar 21, 2006 at 23:32 UTC
    You could try installing the Compress::Zlib module and use that to read the log file directly.

    use strict; use warnings; use Compress::Zlib; # Set up what we want to match. # our $reg1 = "Joe Sinclair"; our $reg2 = "Bill Halburg"; our $rxNames = qr{(?:$reg1|$reg2)} # Open compressed log file. # our $logFile = "file.gz"; our $gzInput = gzopen($logFile, "rb") or die "gzopen: $gzerrno\n"; # Read line by line into $_ counting bytes read. # our $bytesRead; while($bytesRead = $gzInput->gzreadline($_)) { # Print if it matches. # print if /$rxNames/; } # Check that we have read to the end. Close # file. # die "Incomplete read: $gzerrno\n" unless $gzerrno == Z_STREAM_END; $gzInput->close();

    I have not tested this but I have adapted it from a script doing somthing similar.



      Ah yes, without the ability to install the module, I cannot try this, I will however test it on another machine. Thank you for the input.
      Thanks, I got the module installed, and with a bit of tweaking it all works fine now.
Re: dynamic zcat and grep
by graff (Chancellor) on Mar 22, 2006 at 06:10 UTC
    As long as you're willing to try out non-core modules (even though you might have limited ability to install them), you should actually try PerlIO::gzip -- it implements gzip compression and uncompression as a PerlIO layer, so you can do things like:
    use PerlIO::gzip; open( ICMP, "<:gzip", "sometext.gz" ); open( OCMP, ">:gzip", "chosenlines.gz" ); while (<ICMP>) { print OCMP if /something matches/; } close OCMP;
    In other words, this creates an i/o layer that handles the compression for you, on input, output or both, and you just handle the data as if compression were not a factor.

    I can't wait for this to be part of the core distro.

    (update: corrected the spelling on the cpan link)

    UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).

      Thanks graff, I used your suggestion, and this is waht I came up with, does it look right? Or are there some improvements I could make. Since this is only my second week with perl I am open to suggestions!
      #!/usr/local/bin/perl use Time::Local 'timelocal'; use PerlIO::gzip; use IO::Tee; use IO::File; $err = 0; $help = 1 if($ARGV[0] eq '-h'); $help = 1 if($ARGV[0] eq '--help'); $help = 1 if($ARGV[0] eq '-help'); $help = 1 if($ARGV[0] eq ''); $debug = 1 if($ARGV[0] eq '-d'); $msgHelp = "FORMAT - command [-d][-h][--help] Month StartDate EndDate\ +n\tStart & End Date = mm/dd/yyyy"; $msgGreps = "\n----------------------The following greps will be used +for searching:\n"; $msgFiles = "\n----------------------The following files will be searc +hed based on the dates given:\n"; $msgStarting = "\n----------------------Now Starting\n"; if($help == 1){ print $msgHelp; } elsif($debug == 1){ $month = $ARGV[1]; @start = split /\//, $ARGV[2]; @end = split /\//, $ARGV[3]; }else{ $month = $ARGV[0]; @start = split /\//, $ARGV[1]; @end = split /\//, $ARGV[2]; } $inputpath = "/logs/"; $startdate = timelocal(0,0,0, $start[1], $start[0]-1, $start[2]-1900); $enddate = timelocal(0,0,0, $end[1]+1, $end[0]-1, $end[2]-1900); $currenttime = localtime time; $fcount = 1; $gcount = 0; if($debug !=1){$logfile = "win_greplog.txt" }else{$logfile = "testlogf +ile.txt"}; $msgstarting = "\n----------------------$currenttime------------------ +-----\nParse will start with logs dated: startdate = $startdate\nEndi +ng with logs dated: enddate = $enddate\nIn the following directory: +$inputpath\n"; $tee = new IO::Tee(\*STDOUT, new IO::File(">>$logfile")); print $tee "\nDEBUG MODE ON" if($debug == 1); print $tee $msgstarting; opendir INPUTDIR, $inputpath; @inputfiles = grep { (stat "$inputpath/$_")[9] >= $startdate an +d (stat "$inputpath/$_")[9] < $enddate } readdir INPUTDIR; closedir INPUTDIR; $numfiles = @inputfiles; $greps[0] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\Run'; $greps[1] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce'; $greps[2] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnceEx'; $greps[3] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\AeDebug'; $greps[4] = '\SYSTEM\CurrentControlSet\Control\SessionManager\KnownDL +Ls'; $greps[5] = '\SYSTEM\CurrentControlSet\Control\SecurePipeServers\winr +eg'; $greps[6] = '\SOFTWARE\inAgents\EventLog2Syslog'; $greps[7] = '%systemdrive%'; $greps[8] = 'C:\'; $greps[9] = '\system32'; $greps[10] = '\system32\drivers'; $greps[11] = '\system32\config'; $greps[12] = '\system32\spool'; $greps[13] = '\repair'; print $tee $msgGreps; foreach $gname (@greps) { print $tee "\n - greps[$gcount]\t $gname"; $gcount++; } print $tee $msgFiles; foreach $filelist (@inputfiles) { $filelist = $inputpath.$filelist; print $tee "\n - $filelist"; } print $tee $msgStarting; # step into each input file foreach $inputfile (@inputfiles) { # step into each grep $gcount = 0; foreach $grep (@greps) { # build the outputfile $outputfile = $month."_".$gcount."_".$inputfile."_results.txt" +; @results = `zgrep $grep > $outputfile`; $gcount++; } } print $tee "\n\n----------------------Normal Completion\n" if ($err==0 +); close(LOGFILE);
        Well, I'm not in a position to test it myself, so I have to ask you: Have you tried running it, and does it do what you want?

        As for improvements, I can think of several, but if the script works, these are less than crucial -- well, except for the fact that you really should include "use strict", and learn about scoping variables.

        Apart from that, in no particular order:

        • You have "use PerlIO::gzip" at the top, but you never actually use the ":gzip" IO layer. You're just running "zgrep" in backticks.

        • Actually, looking at the zgrep command line in the backticks, I don't see you providing an input file name there -- just a pattern to search for. I would expect the resulting output files to be empty every time.

        • You appear to be generating 14 output files for every input file. Is that really what you want? You never actually say what the goal is here, but fourteen separate output files for each input file seems like a lot.

        • You can simplify and improve your handling of command line options and args. Take a look at Getopt::Std and Getopt::Long -- these are part of the core distribution; also, the following is another alternative (though it doesn't use modules):
          my $debug = 0; my $usage = "Usage: $0 [-d|-h] month start end\n blah blah"; if ( @ARGV and $ARGV[0] =~ /^-+([dh])/ ) { shift; die $usage if ( $1 eq 'h' ); $debug++; } die $usage unless ( @ARGV == 3 ); # could add more conditions...

        • Aside from using $month when naming all those output files, it's not clear what this value is important for. If it's supposed to be different from start and or end dates, how should it be different?

        • Initializing the @greps array can be a lot simpler (and if flexibility would be useful for you, consider loading the list from a data file, which can be named on the command line):
          my @greps = qw(\string\1 \string\1\extra \string\2 );

        Well, enough for now. Good luck with the rest.
Re: dynamic zcat and grep
by eff_i_g (Curate) on Mar 21, 2006 at 23:10 UTC
    What Unix and Perl are you running? Just curious.

    I was able to get this to work in two ways:
    1. I can only use zcat on .Z files, therefore, I used gunzip -c.
    2. I created a .Z file via compress and used zcat as you have.

    Does zcat -h give you any further information?
      On some systems, gunzip -c and zcat are identical.
      thulben@alpha:~ 17 $ md5sum /bin/gzip /bin/zcat /bin/gunzip 57cd8cdf42fbda6e0a1f5e17ac986b4f /bin/gzip 57cd8cdf42fbda6e0a1f5e17ac986b4f /bin/zcat 57cd8cdf42fbda6e0a1f5e17ac986b4f /bin/gunzip
      The executable is one of those "magic" ones that recognizes how it was called and alters its behavior appropriately.


      The only easy day was yesterday

      Thanks for the reply. I'm using - perl, v5.8.7 built for i686-linux Red Hat Enterprise zcat works for me when I enter it via the command line, not when it is in a perl script. gunzip -c gives the same errors. When I use single quotes:
      gunzip: compressed data not read from a terminal. Use -f to force deco +mpression. For help, type: gunzip -h
      When I use double quotes:
      ERROR - Could not open log.gz

        Since you are on a Red Hat machine anyway, you can save yourself a lot of time just by using 'zgrep'.

        We're not surrounded, we're in a target-rich environment!
        You may be able to get a better idea of what is going wrong if you include $! in your error message. This is the variable in which Perl stores the O/S error message when things like open fail, e.g.

        $fn = "non_existant_file"; open IN, "<$fn" or die "open: $fn: $!\n";

        would error with

        open: non_existant_file: no such file or directory



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://538316]
Approved by ww
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (10)
As of 2019-10-14 18:27 GMT
Find Nodes?
    Voting Booth?