Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: unique visitors from html logfile

by space_monk (Chaplain)
on Nov 17, 2012 at 09:16 UTC ( #1004293=note: print w/ replies, xml ) Need Help??


in reply to unique visitors from html logfile

It's not a good idea to use lots of print statements to be outputting your webpage. Either use the CGI or use HEREDOCS to output your page in a one or two statements. I prefer the latter, but I can understand using the former may allow your pages to automatically keep up with changing standards.

Also try and use the following when creating html:

  • Try to consistently use lowercase elements. Uppercase is so AOL/1995 coding style.
  • Use th for table heading columns, not td
  • Use CSS for layout where possible; having a css element saying your table is 500px wide is better than having the width set with the old (deprecated?) width attribute
  • Run your page through the W3C Validator or a local validator program to see how standards compliant your page is

Also, it is probably a good idea not to hard code the name of the logfile and the web output page into the file; these are really things that should be command line parameters so you can read any log file and output yur web page to any file name you like.

perl your_program <access.log >log.html

Your code was resetting the list of IPs on every line, whereas you want a count through the whole logfile

#!/usr/bin/perl use strict; use warnings; use 5.010; use POSIX; # create this outside the loop - it doesn't change # in fact it isn't used so why is it here at all? left in just in case my %dates = ( 'Jan' => '01', 'Feb' => '02', 'Mar' => '03', 'Apr' => '04', 'May' => '05', 'Jun' => '06', 'Jul' => '07', 'Aug' => '08', 'Sep' => '09', 'Oct' => '10', 'Nov' => '11', 'Dec' => '12', ); my $yesterday = strftime("%d/%b/%Y",localtime(time()-86400)); my $yesterdayHits=0; my $IPcount=0; my $totalhits=0; my $startDate; my $tm = scalar(localtime); my %ips=(); my @rows; # read from logfile(s) supplied on command line, instead of fixed file +.... foreach my $line (<>) { $totalhits++; # (.+) is horrible as '.' includes spaces, this is better .... my $w = "(\S+?)"; $line =~ m/^$w $w $w \[$w:$w $w\] "$w $w $w" $w $w$/; # could do all these as one statement, but split for readability.. +. my ($site, $logName, $fullName) = ($1,$2, $3); my ($date, $time, $gmt) = ($4, $5, $6); my ($req, $file, $proto) = ($7, $8, $9); my ($status, $length) = ($10, $11); $ips{$site}++; my ($day,$month,$year)=split"\/",$date; my $row = <<EOF; <tr><td>$site</td><td>$line</td></tr> EOF push @rows,$row; } # Real Men use Data::Dumper :-) foreach my $key ( sort keys %ips ) { print STDERR $key, " => ", $ips{$key}, "\n"; } # write to output file specified on command line instead... print <<EOF; <head> <title>Access Counts</title></head> <body> <h1> Today is: $tm</h1> <h3>Yesterday was $yesterday</h3> <h3>There are $IPcount unique visitors in the log</h3> <table BORDER=1 CELLPADDING=10 width='500px'> <tr><th>IP</th> <th>LOGFILE</th> </tr> @rows <h2>Start Date is $startDate</h2> <h2>Total hits: $totalhits</h2> <h3>Hits Yesterday: $yesterdayHits</h3> </table></p> </body> </html> EOF
A Monk aims to give answers to those who have none, and to learn from those who know more.


Comment on Re: unique visitors from html logfile
Select or Download Code
Replies are listed 'Best First'.
Re^2: unique visitors from html logfile
by Anonymous Monk on Nov 19, 2012 at 23:56 UTC

    the dates hash came about when i was trying to find the start date of the log, that idea fell apart and i thought i removed it from the code before posting

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1004293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2015-07-31 06:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls