Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: dynamic zcat and grep

by graff (Chancellor)
on Mar 22, 2006 at 06:10 UTC ( #538419=note: print w/ replies, xml ) Need Help??


in reply to dynamic zcat and grep

As long as you're willing to try out non-core modules (even though you might have limited ability to install them), you should actually try PerlIO::gzip -- it implements gzip compression and uncompression as a PerlIO layer, so you can do things like:

use PerlIO::gzip; open( ICMP, "<:gzip", "sometext.gz" ); open( OCMP, ">:gzip", "chosenlines.gz" ); while (<ICMP>) { print OCMP if /something matches/; } close OCMP;
In other words, this creates an i/o layer that handles the compression for you, on input, output or both, and you just handle the data as if compression were not a factor.

I can't wait for this to be part of the core distro.

(update: corrected the spelling on the cpan link)

UPDATE: (2010-10-18) It seems that PerlIO::gzip should be viewed as superseded by PerlIO::via:gzip. (see PerlIO::gzip or PerlIO::via::gzip).


Comment on Re: dynamic zcat and grep
Download Code
Re^2: dynamic zcat and grep
by clmcshque (Initiate) on Mar 28, 2006 at 18:36 UTC
    Thanks graff, I used your suggestion, and this is waht I came up with, does it look right? Or are there some improvements I could make. Since this is only my second week with perl I am open to suggestions!
    #!/usr/local/bin/perl use Time::Local 'timelocal'; use PerlIO::gzip; use IO::Tee; use IO::File; $err = 0; $help = 1 if($ARGV[0] eq '-h'); $help = 1 if($ARGV[0] eq '--help'); $help = 1 if($ARGV[0] eq '-help'); $help = 1 if($ARGV[0] eq ''); $debug = 1 if($ARGV[0] eq '-d'); $msgHelp = "FORMAT - command [-d][-h][--help] Month StartDate EndDate\ +n\tStart & End Date = mm/dd/yyyy"; $msgGreps = "\n----------------------The following greps will be used +for searching:\n"; $msgFiles = "\n----------------------The following files will be searc +hed based on the dates given:\n"; $msgStarting = "\n----------------------Now Starting\n"; if($help == 1){ print $msgHelp; } elsif($debug == 1){ $month = $ARGV[1]; @start = split /\//, $ARGV[2]; @end = split /\//, $ARGV[3]; }else{ $month = $ARGV[0]; @start = split /\//, $ARGV[1]; @end = split /\//, $ARGV[2]; } $inputpath = "/logs/"; $startdate = timelocal(0,0,0, $start[1], $start[0]-1, $start[2]-1900); $enddate = timelocal(0,0,0, $end[1]+1, $end[0]-1, $end[2]-1900); $currenttime = localtime time; $fcount = 1; $gcount = 0; if($debug !=1){$logfile = "win_greplog.txt" }else{$logfile = "testlogf +ile.txt"}; $msgstarting = "\n----------------------$currenttime------------------ +-----\nParse will start with logs dated: startdate = $startdate\nEndi +ng with logs dated: enddate = $enddate\nIn the following directory: +$inputpath\n"; $tee = new IO::Tee(\*STDOUT, new IO::File(">>$logfile")); print $tee "\nDEBUG MODE ON" if($debug == 1); print $tee $msgstarting; opendir INPUTDIR, $inputpath; @inputfiles = grep { (stat "$inputpath/$_")[9] >= $startdate an +d (stat "$inputpath/$_")[9] < $enddate } readdir INPUTDIR; closedir INPUTDIR; $numfiles = @inputfiles; $greps[0] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\Run'; $greps[1] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce'; $greps[2] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnceEx'; $greps[3] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\AeDebug'; $greps[4] = '\SYSTEM\CurrentControlSet\Control\SessionManager\KnownDL +Ls'; $greps[5] = '\SYSTEM\CurrentControlSet\Control\SecurePipeServers\winr +eg'; $greps[6] = '\SOFTWARE\inAgents\EventLog2Syslog'; $greps[7] = '%systemdrive%'; $greps[8] = 'C:\'; $greps[9] = '\system32'; $greps[10] = '\system32\drivers'; $greps[11] = '\system32\config'; $greps[12] = '\system32\spool'; $greps[13] = '\repair'; print $tee $msgGreps; foreach $gname (@greps) { print $tee "\n - greps[$gcount]\t $gname"; $gcount++; } print $tee $msgFiles; foreach $filelist (@inputfiles) { $filelist = $inputpath.$filelist; print $tee "\n - $filelist"; } print $tee $msgStarting; # step into each input file foreach $inputfile (@inputfiles) { # step into each grep $gcount = 0; foreach $grep (@greps) { # build the outputfile $outputfile = $month."_".$gcount."_".$inputfile."_results.txt" +; @results = `zgrep $grep > $outputfile`; $gcount++; } } print $tee "\n\n----------------------Normal Completion\n" if ($err==0 +); close(LOGFILE);
      Well, I'm not in a position to test it myself, so I have to ask you: Have you tried running it, and does it do what you want?

      As for improvements, I can think of several, but if the script works, these are less than crucial -- well, except for the fact that you really should include "use strict", and learn about scoping variables.

      Apart from that, in no particular order:

      • You have "use PerlIO::gzip" at the top, but you never actually use the ":gzip" IO layer. You're just running "zgrep" in backticks.

      • Actually, looking at the zgrep command line in the backticks, I don't see you providing an input file name there -- just a pattern to search for. I would expect the resulting output files to be empty every time.

      • You appear to be generating 14 output files for every input file. Is that really what you want? You never actually say what the goal is here, but fourteen separate output files for each input file seems like a lot.

      • You can simplify and improve your handling of command line options and args. Take a look at Getopt::Std and Getopt::Long -- these are part of the core distribution; also, the following is another alternative (though it doesn't use modules):
        my $debug = 0; my $usage = "Usage: $0 [-d|-h] month start end\n blah blah"; if ( @ARGV and $ARGV[0] =~ /^-+([dh])/ ) { shift; die $usage if ( $1 eq 'h' ); $debug++; } die $usage unless ( @ARGV == 3 ); # could add more conditions...

      • Aside from using $month when naming all those output files, it's not clear what this value is important for. If it's supposed to be different from start and or end dates, how should it be different?

      • Initializing the @greps array can be a lot simpler (and if flexibility would be useful for you, consider loading the list from a data file, which can be named on the command line):
        my @greps = qw(\string\1 \string\1\extra \string\2 %and.so.on% );

      Well, enough for now. Good luck with the rest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://538419]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (10)
As of 2014-12-17 19:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (31 votes), past polls