http://www.perlmonks.org?node_id=539775


in reply to Re: dynamic zcat and grep
in thread dynamic zcat and grep

Thanks graff, I used your suggestion, and this is waht I came up with, does it look right? Or are there some improvements I could make. Since this is only my second week with perl I am open to suggestions!
#!/usr/local/bin/perl use Time::Local 'timelocal'; use PerlIO::gzip; use IO::Tee; use IO::File; $err = 0; $help = 1 if($ARGV[0] eq '-h'); $help = 1 if($ARGV[0] eq '--help'); $help = 1 if($ARGV[0] eq '-help'); $help = 1 if($ARGV[0] eq ''); $debug = 1 if($ARGV[0] eq '-d'); $msgHelp = "FORMAT - command [-d][-h][--help] Month StartDate EndDate\ +n\tStart & End Date = mm/dd/yyyy"; $msgGreps = "\n----------------------The following greps will be used +for searching:\n"; $msgFiles = "\n----------------------The following files will be searc +hed based on the dates given:\n"; $msgStarting = "\n----------------------Now Starting\n"; if($help == 1){ print $msgHelp; } elsif($debug == 1){ $month = $ARGV[1]; @start = split /\//, $ARGV[2]; @end = split /\//, $ARGV[3]; }else{ $month = $ARGV[0]; @start = split /\//, $ARGV[1]; @end = split /\//, $ARGV[2]; } $inputpath = "/logs/"; $startdate = timelocal(0,0,0, $start[1], $start[0]-1, $start[2]-1900); $enddate = timelocal(0,0,0, $end[1]+1, $end[0]-1, $end[2]-1900); $currenttime = localtime time; $fcount = 1; $gcount = 0; if($debug !=1){$logfile = "win_greplog.txt" }else{$logfile = "testlogf +ile.txt"}; $msgstarting = "\n----------------------$currenttime------------------ +-----\nParse will start with logs dated: startdate = $startdate\nEndi +ng with logs dated: enddate = $enddate\nIn the following directory: +$inputpath\n"; $tee = new IO::Tee(\*STDOUT, new IO::File(">>$logfile")); print $tee "\nDEBUG MODE ON" if($debug == 1); print $tee $msgstarting; opendir INPUTDIR, $inputpath; @inputfiles = grep { (stat "$inputpath/$_")[9] >= $startdate an +d (stat "$inputpath/$_")[9] < $enddate } readdir INPUTDIR; closedir INPUTDIR; $numfiles = @inputfiles; $greps[0] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\Run'; $greps[1] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce'; $greps[2] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnceEx'; $greps[3] = '\SOFTWARE\Microsoft\Windows\CurrentVersion\AeDebug'; $greps[4] = '\SYSTEM\CurrentControlSet\Control\SessionManager\KnownDL +Ls'; $greps[5] = '\SYSTEM\CurrentControlSet\Control\SecurePipeServers\winr +eg'; $greps[6] = '\SOFTWARE\inAgents\EventLog2Syslog'; $greps[7] = '%systemdrive%'; $greps[8] = 'C:\'; $greps[9] = '\system32'; $greps[10] = '\system32\drivers'; $greps[11] = '\system32\config'; $greps[12] = '\system32\spool'; $greps[13] = '\repair'; print $tee $msgGreps; foreach $gname (@greps) { print $tee "\n - greps[$gcount]\t $gname"; $gcount++; } print $tee $msgFiles; foreach $filelist (@inputfiles) { $filelist = $inputpath.$filelist; print $tee "\n - $filelist"; } print $tee $msgStarting; # step into each input file foreach $inputfile (@inputfiles) { # step into each grep $gcount = 0; foreach $grep (@greps) { # build the outputfile $outputfile = $month."_".$gcount."_".$inputfile."_results.txt" +; @results = `zgrep $grep > $outputfile`; $gcount++; } } print $tee "\n\n----------------------Normal Completion\n" if ($err==0 +); close(LOGFILE);

Replies are listed 'Best First'.
Re^3: dynamic zcat and grep
by graff (Chancellor) on Mar 29, 2006 at 03:43 UTC
    Well, I'm not in a position to test it myself, so I have to ask you: Have you tried running it, and does it do what you want?

    As for improvements, I can think of several, but if the script works, these are less than crucial -- well, except for the fact that you really should include "use strict", and learn about scoping variables.

    Apart from that, in no particular order:

    • You have "use PerlIO::gzip" at the top, but you never actually use the ":gzip" IO layer. You're just running "zgrep" in backticks.

    • Actually, looking at the zgrep command line in the backticks, I don't see you providing an input file name there -- just a pattern to search for. I would expect the resulting output files to be empty every time.

    • You appear to be generating 14 output files for every input file. Is that really what you want? You never actually say what the goal is here, but fourteen separate output files for each input file seems like a lot.

    • You can simplify and improve your handling of command line options and args. Take a look at Getopt::Std and Getopt::Long -- these are part of the core distribution; also, the following is another alternative (though it doesn't use modules):
      my $debug = 0; my $usage = "Usage: $0 [-d|-h] month start end\n blah blah"; if ( @ARGV and $ARGV[0] =~ /^-+([dh])/ ) { shift; die $usage if ( $1 eq 'h' ); $debug++; } die $usage unless ( @ARGV == 3 ); # could add more conditions...

    • Aside from using $month when naming all those output files, it's not clear what this value is important for. If it's supposed to be different from start and or end dates, how should it be different?

    • Initializing the @greps array can be a lot simpler (and if flexibility would be useful for you, consider loading the list from a data file, which can be named on the command line):
      my @greps = qw(\string\1 \string\1\extra \string\2 %and.so.on% );

    Well, enough for now. Good luck with the rest.