Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

counting yesterdays hits in a logfile

by jrp370 (Initiate)
on Nov 13, 2012 at 02:02 UTC ( #1003538=perlquestion: print w/ replies, xml ) Need Help??
jrp370 has asked for the wisdom of the Perl Monks concerning the following question:

using muba's advice I changed how I was getting the value and format of $yesterday which made pattern matching much simpler

I am working on an assignment where i need to parse a log file and create a website based on said log file. one of the requirements is that i count the number of hits that happened on yesterdays, im lost when it comes to this i feel like i am missing something really simple but i cant see it. ive attached my code and the log file im working with hoping that someone can offer some advice, thanks

#!/usr/bin/perl use strict; use warnings; use Time::Piece; use Time::Seconds qw(ONE_DAY); my $yesterday = = strftime("%d/%b/%Y",localtime(time()-86400)); open(LOGFILE,"<", "access.log")or die"Could not open log file."; my $yesterdayHits=0; my $totalhits=0; my $webPage='log.html'; open(WEBPAGE,">",$webPage); print WEBPAGE ("<HEAD><TITLE>Access Counts</TITLE></HEAD>"); print WEBPAGE ("<BODY>"); print WEBPAGE ("<H1> today is: ",scalar(localtime), "</H1>"); print WEBPAGE ("<h3>Yesterday was $yesterday</h3>"); print WEBPAGE ("<TABLE BORDER=1 CELLPADDING=10 width='500px'>"); foreach my $line (<LOGFILE>) { $totalhits++; my $w = "(.+?)"; $line =~ m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w/; my $site = $1; my $logName = $2; my $fullName = $3; my $date = $4; my $time = $5; my $gmt = $6; my $req = $7; my $file = $8; my $proto = $9; my $status = $10; my $length = $11; if($line =~ m/$yesterday/){$yesterdayHits++} print WEBPAGE ("<Tr><TD>$site</TD><TD>$line</TD></Tr>\n\n"); } close(LOGFILE); print WEBPAGE ("<h2>Total hits: $totalhits</h2>"); print WEBPAGE ("<h3>Hits Yesterday: $yesterdayHits</h3>"); print WEBPAGE ("</TABLE></P>"); print WEBPAGE ("</BODY></HTML>"); close(WEBPAGE);
Access log 66.249.65.107 - - [11/Nov/2012:19:33:01 -0400] "GET /support.html +HTTP/1.1" 200 11179 111.111.111.111 - - [11/Nov/2012:19:33:01 -0400] "GET / HTTP/1.1" +200 10801 111.111.111.111 - - [08/Oct/2007:11:17:55 -0400] "GET /style.css H +TTP/1.1" 200 3225 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper +.gif HTTP/1.0" 200 6248 123.123.123.123 - - [26/Apr/2000:00:23:40 -0400] "GET /asctortf/ H +TTP/1.0" 200 8130 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2 +000.gif HTTP/1.0" 200 4005 123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star. +gif HTTP/1.0" 200 1031 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlog +o.jpg HTTP/1.0" 200 4282 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/new +count?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 172.16.130.42 - - [26/Apr/2000:00:00:12 -0400] "GET /contacts.html + HTTP/1.0" 200 4595 10.0.1.3 - - [26/Apr/2000:00:17:19 -0400] "GET /news/news.html HTT +P/1.0" 200 16716 129.21.109.81 - - [26/Apr/2000:00:16:12 -0400] "GET /download/wind +ows/asctab31.zip HTTP/1.0" 200 1540096 192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 2 +00 6394 192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo. +gif HTTP/1.1" 200 807 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports. +html HTTP/1.1" 200 3500 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico +HTTP/1.1" 404 1997 192.168.72.177 - - [04/Nov/2012:23:32:15 -0400] "GET /style.css HT +TP/1.1" 200 4138 192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HT +TP/1.1" 200 10229 192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php H +TTP/1.1" 400 1997 127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET / HTTP/1.1" 500 60 +6 127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET /favicon.ico HTTP/ +1.1" 200 766 139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET / HTTP/1.1" 500 6 +12 139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET /favicon.ico HTTP +/1.1" 200 766 127.0.0.1 - - [10/Apr/2007:10:53:10 +0300] "GET / HTTP/1.1" 500 61 +2 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/1.0" 200 37 +00 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /style.css HTTP/1. +1" 200 614 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /img/pti-round.jpg + HTTP/1.1" 200 17524 127.0.0.1 - - [10/Apr/2007:10:54:21 +0300] "GET /unix_sysadmin.htm +l HTTP/1.1" 200 3880 217.0.22.3 - - [04/Nov/2012:10:54:51 +0300] "GET / HTTP/1.1" 200 3 +4 217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET /favicon.ico HTTP +/1.1" 200 11514 217.0.22.3 - - [10/Apr/2007:10:54:53 +0300] "GET /cgi/pti.pl HTTP/ +1.1" 500 617 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/0.9" 200 37 +00 217.0.22.3 - - [10/Apr/2007:10:58:27 +0300] "GET / HTTP/1.1" 200 3 +700 217.0.22.3 - - [10/Apr/2007:10:58:34 +0300] "GET /unix_sysadmin.ht +ml HTTP/1.1" 200 3880 217.0.22.3 - - [10/Apr/2007:10:58:45 +0300] "GET /talks/Fundamenta +ls/read-excel-file.html HTTP/1.1" 404 311

Comment on counting yesterdays hits in a logfile
Select or Download Code
Re: counting yesterdays hits in a logfile
by muba (Priest) on Nov 13, 2012 at 03:11 UTC

    First of all, have you seen how Perlmonks tries to format your log file example? Maybe you should put your log file inside code tag as well.

    Secondly, and more relevant to your question, your variable $yesterday contains the time it was exactly 24 hours ago. Yesterday contained 24 * 60 * 60 = 86400 seconds, and $yesterday covers exactly one of those - therein lies the basis of your problem.

    One way (of many) to solve this, is to calculate two values - $yesterday_start and $yesterday_end, and then check if the time stamp of the line in the log file lies between these two values. DateTime might prove useful. But beware of edge cases such as those days that switch from or to DST and other tricky business like that.

    Another way could be to take $yesterday the way you do it now, and strip off the time information so that you only hold the date information. That's probably the easiest, because except for the way you obtaini $yesterday your code can run unaltered. But again, be careful that you don't run into problems on days that DST ends.

    And finally, these days the three-arguments open (which you already use) with lexical filehandles is recommended. See perldoc open.

      thank you for your time I used your advice and changed how I was getting the value of $yesterday and formatted it to match my log file, which made counting the hits a lot simpler.

      my $yesterday = strftime("%d/%b/%Y",localtime(time()-86400));
Re: counting yesterdays hits in a logfile
by Kenosis (Priest) on Nov 13, 2012 at 03:48 UTC

    Perhaps the following will help:

    use strict; use warnings; use Time::Piece; use Time::Seconds qw(ONE_DAY); my $yesterday = localtime() - ONE_DAY(); my $mergedYesterday = join '/', ( split ' ', $yesterday )[ 2, 1, 4 ]; while (<DATA>) { my ($date) = /\[([^:]+)/; # Capture date, e.g., 11/Nov/2012 $date =~ s/^0//; # Remove leading zero: 08/Oct/2007 -> 8/Oct/2007 print if $date eq $mergedYesterday; } __DATA__ 66.249.65.107 - - [11/Nov/2012:19:33:01 -0400] "GET /support.html HTTP +/1.1" 200 11179 111.111.111.111 - - [11/Nov/2012:19:33:01 -0400] "GET / HTTP/1.1" 200 +10801 111.111.111.111 - - [08/Oct/2007:11:17:55 -0400] "GET /style.css HTTP/ +1.1" 200 3225 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif + HTTP/1.0" 200 6248 123.123.123.123 - - [26/Apr/2000:00:23:40 -0400] "GET /asctortf/ HTTP/ +1.0" 200 8130 123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000. +gif HTTP/1.0" 200 4005 123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif +HTTP/1.0" 200 1031 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jp +g HTTP/1.0" 200 4282 123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcoun +t?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 172.16.130.42 - - [26/Apr/2000:00:00:12 -0400] "GET /contacts.html HTT +P/1.0" 200 4595 10.0.1.3 - - [26/Apr/2000:00:17:19 -0400] "GET /news/news.html HTTP/1. +0" 200 16716 129.21.109.81 - - [26/Apr/2000:00:16:12 -0400] "GET /download/windows/ +asctab31.zip HTTP/1.0" 200 1540096 192.168.198.92 - - [22/Dec/2002:23:08:37 -0400] "GET / HTTP/1.1" 200 6 +394 192.168.198.92 - - [22/Dec/2002:23:08:38 -0400] "GET /images/logo.gif +HTTP/1.1" 200 807 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /news/sports.html + HTTP/1.1" 200 3500 192.168.72.177 - - [22/Dec/2002:23:32:14 -0400] "GET /favicon.ico HTTP +/1.1" 404 1997 192.168.72.177 - - [04/Nov/2012:23:32:15 -0400] "GET /style.css HTTP/1 +.1" 200 4138 192.168.72.177 - - [22/Dec/2002:23:32:16 -0400] "GET /js/ads.js HTTP/1 +.1" 200 10229 192.168.72.177 - - [22/Dec/2002:23:32:19 -0400] "GET /search.php HTTP/ +1.1" 400 1997 127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET / HTTP/1.1" 500 606 127.0.0.1 - - [10/Apr/2007:10:39:11 +0300] "GET /favicon.ico HTTP/1.1" + 200 766 139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET / HTTP/1.1" 500 612 139.12.0.2 - - [10/Apr/2007:10:40:54 +0300] "GET /favicon.ico HTTP/1.1 +" 200 766 127.0.0.1 - - [10/Apr/2007:10:53:10 +0300] "GET / HTTP/1.1" 500 612 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/1.0" 200 3700 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /style.css HTTP/1.1" 2 +00 614 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET /img/pti-round.jpg HTT +P/1.1" 200 17524 127.0.0.1 - - [10/Apr/2007:10:54:21 +0300] "GET /unix_sysadmin.html HT +TP/1.1" 200 3880 217.0.22.3 - - [04/Nov/2012:10:54:51 +0300] "GET / HTTP/1.1" 200 34 217.0.22.3 - - [10/Apr/2007:10:54:51 +0300] "GET /favicon.ico HTTP/1.1 +" 200 11514 217.0.22.3 - - [10/Apr/2007:10:54:53 +0300] "GET /cgi/pti.pl HTTP/1.1" + 500 617 127.0.0.1 - - [10/Apr/2007:10:54:08 +0300] "GET / HTTP/0.9" 200 3700 217.0.22.3 - - [10/Apr/2007:10:58:27 +0300] "GET / HTTP/1.1" 200 3700 217.0.22.3 - - [10/Apr/2007:10:58:34 +0300] "GET /unix_sysadmin.html H +TTP/1.1" 200 3880 217.0.22.3 - - [10/Apr/2007:10:58:45 +0300] "GET /talks/Fundamentals/r +ead-excel-file.html HTTP/1.1" 404 31

    Output:

    66.249.65.107 - - [11/Nov/2012:19:33:01 -0400] "GET /support.html HTTP +/1.1" 200 11179 111.111.111.111 - - [11/Nov/2012:19:33:01 -0400] "GET / HTTP/1.1" 200 +10801

    The join:

    my $mergedYesterday = join '/', ( split ' ', $yesterday )[ 2, 1, 4 ]; ^ ^ ^ ^ ^ ^ | | | | | | | | | | | + - 2 +012 | | | | + - Nov | | | + - 11 | | + - Sun Nov 11 19:31 +:46 2012 | + - Use same delimiter as in log records + - 11/Nov/2012

    The capturing regex

    /\[([^:]+)/; ^ ^ | | | + - Capture everything up to the first colon, e.g., 11/Nov/2012 + - Anchor at left bracket
Re: counting yesterdays hits in a logfile
by space_monk (Chaplain) on Nov 13, 2012 at 05:10 UTC
    I'm assuming you only need to count the number of hits as you say, not list them as you have done in your example. First you need the string for yesterdays date. This is one way but there are others....
    use POSIX qw(strftime); $time= localtime(time-(24*60*60)); $yesterday = strftime( '%d/%m/%y', $time);
    There are other and better ways using libraries such as Date::Calc to do the same thing.
    `grep "$yesterday" logfile.log | wc -l`

    Number of lines returned is the number of hits on your site. This command can be run inside a Perl script inside backticks.

    However, the grep command exists in Perl too, so just read the file into an array and use the Perl grep command, as in scalar context it returns the number of matches.

    open(LOGFILE,"<", "access.log")or die"Could not open log file."; my @logfile = <LOGFILE> my $hits = grep /$yesterday/, @logfile;
    A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: counting yesterdays hits in a logfile
by space_monk (Chaplain) on Nov 13, 2012 at 05:29 UTC
    Printing your webpage is much better using a HEREDOC e.g
    close(LOGFILE); open(WEBPAGE,">",$webPage); $tm = strftime( "%d/%m/%Y", localtime()); print WEBPAGE <<EOF; <head><title>Access Counts</title> </head> <body> <h1> today is: $tm</h1> <h3>Yesterday was $yesterday</h3> <table border="1" cellpadding="10" width='500px'> <h2>Total hits: $totalhits</h2> <h3>Hits Yesterday: $yesterdayHits</h3> $rows </table></p> </body> </html> EOF close( WEBPAGE);
    Also please use CSS to format your table layout, keep html tags lowercase; Uppercase is *so* 1995 ;-). I could go on but one of your future tasks is obviously to learn about good HTML presentation and layout.
    A Monk aims to give answers to those who have none, and to learn from those who know more.

      Basic CGI can even make the html a lot better, like so:

      ... use CGI qw(:standard); my $q = CGI->new; print $q->start_html( -title => 'Yesterday Hits' ), $q->h1("today is: $tm"), $q->h3("Yesterday was $yesterday"), $q->table( { border => '1', cellpadding => '10', width => '500' }, Tr( td("Total Hits: $totalhits"), td("Hits Yesterday: $yeste +rdayHits"), ) ), $q->end_html;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003538]
Approved by muba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (18)
As of 2014-07-31 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (249 votes), past polls