Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: unique visitors from html logfile

by space_monk (Chaplain)
on Nov 17, 2012 at 09:16 UTC ( #1004293=note: print w/replies, xml ) Need Help??

in reply to unique visitors from html logfile

It's not a good idea to use lots of print statements to be outputting your webpage. Either use the CGI or use HEREDOCS to output your page in a one or two statements. I prefer the latter, but I can understand using the former may allow your pages to automatically keep up with changing standards.

Also try and use the following when creating html:

  • Try to consistently use lowercase elements. Uppercase is so AOL/1995 coding style.
  • Use th for table heading columns, not td
  • Use CSS for layout where possible; having a css element saying your table is 500px wide is better than having the width set with the old (deprecated?) width attribute
  • Run your page through the W3C Validator or a local validator program to see how standards compliant your page is

Also, it is probably a good idea not to hard code the name of the logfile and the web output page into the file; these are really things that should be command line parameters so you can read any log file and output yur web page to any file name you like.

perl your_program <access.log >log.html

Your code was resetting the list of IPs on every line, whereas you want a count through the whole logfile

#!/usr/bin/perl use strict; use warnings; use 5.010; use POSIX; # create this outside the loop - it doesn't change # in fact it isn't used so why is it here at all? left in just in case my %dates = ( 'Jan' => '01', 'Feb' => '02', 'Mar' => '03', 'Apr' => '04', 'May' => '05', 'Jun' => '06', 'Jul' => '07', 'Aug' => '08', 'Sep' => '09', 'Oct' => '10', 'Nov' => '11', 'Dec' => '12', ); my $yesterday = strftime("%d/%b/%Y",localtime(time()-86400)); my $yesterdayHits=0; my $IPcount=0; my $totalhits=0; my $startDate; my $tm = scalar(localtime); my %ips=(); my @rows; # read from logfile(s) supplied on command line, instead of fixed file +.... foreach my $line (<>) { $totalhits++; # (.+) is horrible as '.' includes spaces, this is better .... my $w = "(\S+?)"; $line =~ m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w$/; # could do all these as one statement, but split for readability.. +. my ($site, $logName, $fullName) = ($1,$2, $3); my ($date, $time, $gmt) = ($4, $5, $6); my ($req, $file, $proto) = ($7, $8, $9); my ($status, $length) = ($10, $11); $ips{$site}++; my ($day,$month,$year)=split"\/",$date; my $row = <<EOF; <tr><td>$site</td><td>$line</td></tr> EOF push @rows,$row; } # Real Men use Data::Dumper :-) foreach my $key ( sort keys %ips ) { print STDERR $key, " => ", $ips{$key}, "\n"; } # write to output file specified on command line instead... print <<EOF; <head> <title>Access Counts</title></head> <body> <h1> Today is: $tm</h1> <h3>Yesterday was $yesterday</h3> <h3>There are $IPcount unique visitors in the log</h3> <table BORDER=1 CELLPADDING=10 width='500px'> <tr><th>IP</th> <th>LOGFILE</th> </tr> @rows <h2>Start Date is $startDate</h2> <h2>Total hits: $totalhits</h2> <h3>Hits Yesterday: $yesterdayHits</h3> </table></p> </body> </html> EOF
A Monk aims to give answers to those who have none, and to learn from those who know more.

Replies are listed 'Best First'.
Re^2: unique visitors from html logfile
by Anonymous Monk on Nov 19, 2012 at 23:56 UTC

    the dates hash came about when i was trying to find the start date of the log, that idea fell apart and i thought i removed it from the code before posting

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1004293]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2017-06-22 13:29 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (520 votes). Check out past polls.