Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

question about regex and using a scalar variable to store/call it

by Anonymous Monk
on Sep 20, 2001 at 17:57 UTC ( #113589=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

!novice alert! pardon me, I realize this is more than likely a very simple question, however, I cannot (for the life of me) figure out why it doesn't work. I want to store the pattern of the apache log and IIS log format in a regex type thing ($apache_log and $iis_log). I would also like to store a string ($docuement and $document2) that happens to be something I am looking for in the document (i.e. index.html or a specific query). but my silly little script doesn't appear to work very well, ANY help would be greatly appreciated. here is some code to do roughly what I am thinking:
#!/usr/bin/perl -w $apache_log="/(.*)\s+-\s+-\s+\[(\d+)\/(\w+)\/(\d+):(\d+):(\d+)/"; # Hmm I know there is a better way to parse this $iis_log="/(\d+):(\d+):(\d+)\s+(\d+\.\d+\.\d+\.\d+)\s/"; $document="/report.html\?/"; $document2="/secret.html\?/"; # open the log file - the log file we want to check out open(LOG, "<$ARGV[0]") || die("Could not open $ARGV[0] : $!"); # open our Report file - the file we will write out report to open(REPORT, ">$ARGV[1]") || die("Could not open $ARGV[1] : $!"); while(<LOG>){ if($document){ ($ip, $day, $month, $year, $hr, $min) = $_ = $apache_log; # if I was + using IIS I would set this to $iis_log, is there a better way? $totaldoc++; print REPORT "Access on doc1 from: $ip on $month $day at $hr:$min\n" +; } elsif($document2){ $totaldoc2++; print REPORT "Access on doc2 from: $ip on $month $day at $hr:$min\n" +; } } print REPORT "Total doc1: $totaldoc\n"; print REPORT "Total doc2: $totaldoc2\n"; close(LOG); close(REPORT);

Replies are listed 'Best First'.
Re: question about regex and using a scalar variable to store/call it
by suaveant (Parson) on Sep 20, 2001 at 18:10 UTC
    Well... probably the best way to store a regex is qr...
    $regex = qr/(test)foo/i; #then to use it $var =~ $regex;
    you can store a regex string in a variable as so
    $regex = '(test)foo'; #but then to call it you must do $var =~ /$regex/;
    now, the first way is much better usually, especially if you use that pattern more than once, since it only has to compile once.

    I would do...

    #!/usr/bin/perl -w $log=qr/(.*)\s+-\s+-\s+\[(\d+)\/(\w+)\/(\d+):(\d+):(\d+)/; $log=qr/(\d+):(\d+):(\d+)\s+(\d+\.\d+\.\d+\.\d+)\s/ if $ARGV[2] eq 'ii +s'; $document=qr/report.html\?/; $document2=qr/secret.html\?/; # open the log file - the log file we want to check out open(LOG, "<$ARGV[0]") || die("Could not open $ARGV[0] : $!"); # open our Report file - the file we will write out report to open(REPORT, ">$ARGV[1]") || die("Could not open $ARGV[1] : $!"); while(<LOG>){ if($_ =~ $document){ ($ip, $day, $month, $year, $hr, $min) = ($_ =~ $log); $totaldoc++; print REPORT "Access on doc1 from: $ip on $month $day at $hr:$ +min\n"; } elsif($document2){ $totaldoc2++; print REPORT "Access on doc2 from: $ip on $month $day at $hr:$ +min\n"; } } print REPORT "Total doc1: $totaldoc\n"; print REPORT "Total doc2: $totaldoc2\n"; close(LOG); close(REPORT);
    now if your third arg is iis, it will parse iis logs, otherwise it will do apache... that help? Of course, it would not be a bad idea to use strict, either...

                    - Ant
                    - Some of my best work - Fish Dinner

Re: question about regex and using a scalar variable to store/call it
by tommyw (Hermit) on Sep 20, 2001 at 18:07 UTC

    You're mixing up the regexp matching operator (//) with the regular expression itself

    So your first test needs to become
    $document="report.html\?"; ... if(/$document/){
    and similarly for the others.

    Oop. The comparison to $apache_log on the next line is wrong: you've missed out a ~, so you're using straight assignment

Re: question about regex and using a scalar variable to store/call it
by Rhose (Priest) on Sep 20, 2001 at 18:27 UTC
    As I understand the question, it is how to store the regular expression in a variable (and use them later.)

    Here is a quick example which uses a "junk" Apache log and IIS log (they each contain but three fields -- a number, a file, and a date.) You can work out the expressions for the real logs. *Smiles*

    use strict; #-- Define variables my $mDate; my $mDoc; my $mFile; my $mNumber; my $mSystem = 'apache'; my @mDocuments = ( 'report.html', 'secret.html' ); my %mLogExpr = ( apache => '^(\d+)\s*(\S+)\s*(\S+)$', iis => '^(\S+)\s*(\S+)\s*(\d+)$' ); my %mTotalDocs; #-- Initialize accumulators foreach (@mDocuments) { $mTotalDocs{$_} = 0; } #-- Process the information while (<DATA>) { #-- Parse line and skip non-matching ones next if ! /$mLogExpr{$mSystem}/; #-- Store the parsed values if ($mSystem eq 'apache') { ($mNumber, $mFile, $mDate)=($1,$2,$3); } else { ($mFile, $mDate, $mNumber)=($1,$2,$3); } #-- Since I am not doing anything with the information, #-- I am going to print it. *Smiles* print "Number:\t", $mNumber, "\n"; print "File:\t", $mFile, "\n"; print "Date:\t", $mDate, "\n"; print "\n"; #-- Check for documents foreach $mDoc (@mDocuments) { $mTotalDocs{$mDoc}++ if /$mDoc/; } } #-- Print the results foreach (@mDocuments) { print "Total for $_: $mTotalDocs{$_}\n"; } __DATA__ #Apache Log 123 report.html 2001-09-16 123 report.html 2001-09-17 234 nothing.html 2001-09-18 123 report.html 2001-09-18 345 secret.html 2001-09-19 567 stuff.html 2001-09-20 # IIS log report.html 2001-09-16 123 report.html 2001-09-17 123 nothing.html 2001-09-18 234 report.html 2001-09-18 123 secret.html 2001-09-19 345 stuff.html 2001-09-20 567

    You can switch the log back and forth (between apache and iis) by changing the one $mSystem variable.

    Does this help?

    Update

    I forgot to mention this the first time, but I also noticed a couple of other things in your example:

    • The '/' is included within the expression string
    • When building an expression string, ' is better than " to make sure characters are not evaluated
    • The information for a line is only being saved if the line matched /$document/... when /$document2/ matches, the OLD information will be printed to the report

    Update #2

    I need to start using next unless instead of next if ! *Grins*

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://113589]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2019-09-19 17:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The room is dark, and your next move is ...












    Results (249 votes). Check out past polls.

    Notices?