Category: |
Web Logging |
Author/Contact Info |
Mike Bohlmann Email
Web |
Description: |
Since I don't have access to my ISP's server logs but
still want to have some idea of who's visiting my website,
I developed this little script that can integrate easily
with any existing site. All you need is CGI access of
some type.
To install, save the script so it is executable. Also,
you'll need to set the $home, $logfile, $ips (IP addresses
you want to ignore), and %entries (labels and expressions
to match) variables. Be sure to "touch" the logfile and
make it writable by the web server's user.
Pick an image, preferably a small one, on your page.
<img src="/cgi-bin/showpic/path_to_pic_in_document_root/pic.jpg">
Each time someone accesses the page with that image, an
entry is made in the log with the date, time, and either
hostname or IP address. Here's an example of the output.
Enjoy.
Wed Apr 05 13:08:26 2000 d208.focal6.interaccess.com resume
Wed Apr 05 13:29:29 2000 audialup197.phnx.uswest.net
Thu Apr 06 01:31:47 2000 adsl-63-194-241-209.dsl.lsan03.pacbell.net
|
#!/usr/bin/perl -w
# showpic logger
# Setup configuration
# $home - filesystem path to append to images
# $logfile - full path to the logfile you want to use
# $ips - $ip addresses you want to ignore in the log
# %entries - hash of labels => regexes for log
my $home = '/my_path_to/public_html';
my $logfile = '/my_path_to/access_log';
my $ips = '127.0.0.1';
my %entries = (
'home' => 'comatose/index.html$|comatose/$',
'resume' => 'resume'
);
use strict;
use Socket;
# Open the pic or DIE!
my $pic = $home . $ENV{PATH_INFO};
open PIC, "$pic" or die "Could not open $pic: $!";
# Get the size and MIME type
my $type = '';
my $size = -s $pic;
if ($pic =~ /gif$/) {
$type = 'gif';
} elsif ($pic =~ /jpg$/) {
$type = 'jpeg';
} else {
die "Unknown image type";
}
print "Content-type: image/$type\n";
print "Content-length: $size\n\n";
print while (<PIC>);
# Now make the log entry
my ($sec, $min, $hour, $mday, undef, $year, undef, undef) =
localtime($^T);
my $date = ('Sun','Mon','Tue','Wed','Thu','Fri',
'Sat')[(localtime)[6]] . ' ';
$date .= ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug',
'Sep','Oct','Nov','Dec')[(localtime)[4]];
# Adjust timestamp to suit taste
$year += 1900;
foreach (($mday, $sec, $min, $hour)) {
$_ = '0' . $_ if ($_ < 10);
}
$date .= " $mday $hour:$min:$sec " . $year;
unless ($ENV{REMOTE_ADDR} =~ /$ips/) {
my $hostname;
$hostname =
gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}),
AF_INET) or $hostname = $ENV{REMOTE_ADDR};
my $data = "$date $hostname";
open LOG, ">>$logfile" or last;
flock LOG, 2;
foreach (keys %entries) {
print LOG "$data $_\n" if
($ENV{HTTP_REFERER} =~ /$entries{$_}/);
}
close LOG;
}
|
RE: Poor Man's Web Logger
by turnstep (Parson) on Apr 06, 2000 at 00:14 UTC
|
Nice script! A few notes:
- For better portablity, consider making the "my ISP" IP numbers into
variables that can be easily changed at the top, like $home and $logfile
are.
- Use the $^T variable instead of calling 'time'
- You could also just say $date=scalar localtime; :)
- Put the gethostbyaddr line before the open and flock. That way,
the data can be dumped in very quickly and the file closed again. Even better,
write all the data into one variable and have a single print LOG "$data\n";
line between the lock and the close.
- If you can't get to your access logs, the die
statement at the end of 'open PIC' does not really help much. On a similar note, you
should check the return value of OPEN in 'open LOG' and bypass the rest
of the loop if it fails.
- Some servers will report REMOTE_HOST as well as REMOTE_ADDR. If your server is
lucky enough to do that, you can remove all the socket stuff!
- Put the "Content-type" line at the start of the script,
so the browser knows as
early as possible that it is receiving an image.
- You should also tell the browser the size, with a Content-Length header. This info is easily grabbed with
-s $pic
- For even better speed, consider hard-coding the image into the script itself. A small gif
can be squeezed as small as 35 bytes!
- Output the gif, then do your log file. Not only does the
client not have to wait for the log file writing, but you
can simply die if 'open LOG' or 'flock LOG' does not work.
- Using all of the above, I can squeeze it down to a 6 line script!
| [reply] [Watch: Dir/Any] |
|
Thanks for the good feedback. I took some of your comments
and incorporated them into a new version and expanded on
some ideas as well.
For example, the reason I constructed the $date and $hostname
into a single variable was so that I could put a different
label on each entry. With the %entries hash, you can now
setup different pattern matches and the corresponding log
entry.
I left it doing the hostname lookup everytime simply because
every sane web server administrator has hostname lookups for
access turned off, leaving that variable empty.
If I were cruel, I'd have it send SERVER_ADMIN an email
everytime it got a REMOTE_HOST variable. :)
| [reply] [Watch: Dir/Any] |
|
A question about this REMOTE_HOST stuff--generally, server
admins disable the hostname translation because it slows
down the server, correct? I agree that it's the "sane"
thing to do--but in this case, aren't you sort of defeating
the purpose by doing the hostname translation yourself?
The whole point, I thought, was to disable runtime hostname
translation, because it slows things down; you seem to
agree, so why re-introduce this step? Why not just
log IP addresses, then run a cron job later as part of your
stat analysis to do the hostname translation?
Just something to think about. If I'm missing the main
issues, let me know.
| [reply] [Watch: Dir/Any] |
|
|
|
| [reply] [Watch: Dir/Any] |
Okay. Problem solved.
by providencia (Pilgrim) on Apr 12, 2000 at 06:40 UTC
|
After talking to someone else I found out that our(I work for a *non-profit ISP)system isn't normal.
It was intentionally setup this way so users wouldn't crowd our lowly disk space with logs. It also keeps people from wanting to run Mom and Pop webstores. It took some tweaking
but it works now. The data is also formatted to look like an apache logfile. That way it is much easier to parse the logfile. I'm very appreciative of the program. Notice that
it plays nice for Daylight Savings Time (I'm in the central time zone).
I didn't personally make the changes. A friend did.
At the time it was quicker for him to make the changes
and for me to figure it out later than for me to break
his train of thought as he taught me. :)
* Not a non-profit "like" ISP. But a real Free-Net.
Changes were made here:
my %entries = ('home' => '/index.html$|/$');
And here:
my $zulu_offset = $isdst ? "-0500" : "-0600";
my $date = "[$mday/$month/$year:$hour:$min:$sec $zulu_offset]";
unless ($ENV{REMOTE_ADDR} =~ /^$ips/) {
my $hostname;
$hostname = gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}), AF_INE
+T) or
$hostname = $ENV{REMOTE_ADDR};
my $data = "$hostname - - $date";
open LOG, ">>$logfile" or last;
flock LOG, 2;
foreach (keys %entries) {
print LOG "$data $_\n" if ($ENV{HTTP_REFERER} =~ /$entries{$
+_}/);
}
close LOG;
}
| [reply] [Watch: Dir/Any] [d/l] [select] |
RE: Poor Man's Web Logger
by providencia (Pilgrim) on Apr 11, 2000 at 19:42 UTC
|
I can't seem to get this script to work.
I'm a newbie to perl and programming so this could explain alot.
What tag do I put in the html?
The tag below doesn't make sense to me.
<img src="/cgi-bin/showpic/path_to_pic_in_document_root/pic.jpg">
Let's say my image is /home/username/www/cgi-bin/showpic/spacer.gif
I'm confused.
I know I'm beating myself up somewhere.
At what point am I making this harder than it needs to be? | [reply] [Watch: Dir/Any] [d/l] |
|
The idea behind this is that you typically already have
images on your web page; so you replace the SRC of one
of the images with a URL that routes through this logger,
but still returns the image. So the browser doesn't know
the difference, but the server does.
So what you should actually do is just use one of the images
you already have on your site. You say that the image you
want to use is at /home/username/www/cgi-bin/showpic/spacer.gif? That's not
right. "showpic" is the name of the CGI script, so it
can't also be a directory on the filesystem.
Let's say that you currently have an image like this on
your web page:
<img src="/images/foo.gif">
And the actual filesystem location of this image is:
/home/bar/www/images/foo.gif
You would replace the IMG tag with this:
<img src="/cgi-bin/showpic/images/foo.gif">
And, in showpic (the CGI script), you'd set the $home
variable to:
my $home = "/home/bar/www";
Make sense?
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
After reading your problem, the $home variable really
serves to save you from giving yourself carpal tunnel syndrome
from typing out the full path.
For a moment, let's say you keep all your images to your
site in the filesystem path /home/username/www/images/.
Your web-based path to the script might be /cgi-bin/showpic.
Now you want to log who is visiting each of your pages using
this script without putting the same image on every page.
You could then set $home equal to
'/home/username/www/images'. Then for each image that you
want to have generate an entry in the log, you would set
the src to '/cgi-bin/showpic/myHead.jpg' or
'/cgi-bin/showpic/myHouse.jpg' or whatever.
However, if you have images all over your filesystem,
you might set $home to '/home/username/www'. Then you
image src could be '/cgi-bin/showpic/images/myHead.jpg' or
'/cgi-bin/showpic/britain/bigben.jpg' or any other image
you have on a page you want to log.
| [reply] [Watch: Dir/Any] |
|
|