Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Poor Man's Web Logger

by comatose (Monk)
on Apr 05, 2000 at 21:56 UTC ( #6953=sourcecode: print w/ replies, xml ) Need Help??

Category: Web Logging
Author/Contact Info Mike Bohlmann Email Web
Description:

Since I don't have access to my ISP's server logs but still want to have some idea of who's visiting my website, I developed this little script that can integrate easily with any existing site. All you need is CGI access of some type.

To install, save the script so it is executable. Also, you'll need to set the $home, $logfile, $ips (IP addresses you want to ignore), and %entries (labels and expressions to match) variables. Be sure to "touch" the logfile and make it writable by the web server's user.

Pick an image, preferably a small one, on your page.

<img src="/cgi-bin/showpic/path_to_pic_in_document_root/pic.jpg">

Each time someone accesses the page with that image, an entry is made in the log with the date, time, and either hostname or IP address. Here's an example of the output. Enjoy.

Wed Apr 05 13:08:26 2000 d208.focal6.interaccess.com resume
Wed Apr 05 13:29:29 2000 audialup197.phnx.uswest.net
Thu Apr 06 01:31:47 2000 adsl-63-194-241-209.dsl.lsan03.pacbell.net
#!/usr/bin/perl -w
# showpic logger

# Setup configuration
# $home - filesystem path to append to images
# $logfile - full path to the logfile you want to use
# $ips - $ip addresses you want to ignore in the log
# %entries - hash of labels => regexes for log

my $home = '/my_path_to/public_html';
my $logfile = '/my_path_to/access_log';
my $ips = '127.0.0.1';
my %entries = (
        'home' => 'comatose/index.html$|comatose/$',
        'resume' => 'resume'
        );

use strict;
use Socket;

# Open the pic or DIE!
my $pic = $home . $ENV{PATH_INFO};
open PIC, "$pic" or die "Could not open $pic: $!";

# Get the size and MIME type
my $type = '';
my $size = -s $pic;

if ($pic =~ /gif$/) {
        $type = 'gif';
} elsif ($pic =~ /jpg$/) {
        $type = 'jpeg';
} else {
        die "Unknown image type";
}

print "Content-type: image/$type\n";
print "Content-length: $size\n\n";

print while (<PIC>);

# Now make the log entry
my ($sec, $min, $hour, $mday, undef, $year, undef, undef) =
                localtime($^T);
my $date = ('Sun','Mon','Tue','Wed','Thu','Fri',
    'Sat')[(localtime)[6]] . ' ';
$date .= ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug',
    'Sep','Oct','Nov','Dec')[(localtime)[4]];

# Adjust timestamp to suit taste
$year += 1900;
foreach (($mday, $sec, $min, $hour)) {
        $_ = '0' . $_ if ($_ < 10);
}

$date .= " $mday $hour:$min:$sec " . $year;

unless ($ENV{REMOTE_ADDR} =~ /$ips/) {
        my $hostname;
        $hostname = 
        gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}),
        AF_INET) or $hostname = $ENV{REMOTE_ADDR};
        my $data = "$date $hostname";
        open LOG, ">>$logfile" or last;
        flock LOG, 2;
        foreach (keys %entries) {
          print LOG "$data $_\n" if 
        ($ENV{HTTP_REFERER} =~ /$entries{$_}/);
        }
        close LOG;
}

Comment on Poor Man's Web Logger
Download Code
RE: Poor Man's Web Logger
by turnstep (Parson) on Apr 06, 2000 at 00:14 UTC
    Nice script! A few notes:
    • For better portablity, consider making the "my ISP" IP numbers into variables that can be easily changed at the top, like $home and $logfile are.
    • Use the $^T variable instead of calling 'time'
    • You could also just say $date=scalar localtime; :)
    • Put the gethostbyaddr line before the open and flock. That way, the data can be dumped in very quickly and the file closed again. Even better, write all the data into one variable and have a single print LOG "$data\n"; line between the lock and the close.
    • If you can't get to your access logs, the die statement at the end of 'open PIC' does not really help much. On a similar note, you should check the return value of OPEN in 'open LOG' and bypass the rest of the loop if it fails.
    • Some servers will report REMOTE_HOST as well as REMOTE_ADDR. If your server is lucky enough to do that, you can remove all the socket stuff!
    • Put the "Content-type" line at the start of the script, so the browser knows as early as possible that it is receiving an image.
    • You should also tell the browser the size, with a Content-Length header. This info is easily grabbed with -s $pic
    • For even better speed, consider hard-coding the image into the script itself. A small gif can be squeezed as small as 35 bytes!
    • Output the gif, then do your log file. Not only does the client not have to wait for the log file writing, but you can simply die if 'open LOG' or 'flock LOG' does not work.
    • Using all of the above, I can squeeze it down to a 6 line script!

      Thanks for the good feedback. I took some of your comments and incorporated them into a new version and expanded on some ideas as well.

      For example, the reason I constructed the $date and $hostname into a single variable was so that I could put a different label on each entry. With the %entries hash, you can now setup different pattern matches and the corresponding log entry.

      I left it doing the hostname lookup everytime simply because every sane web server administrator has hostname lookups for access turned off, leaving that variable empty.

      If I were cruel, I'd have it send SERVER_ADMIN an email everytime it got a REMOTE_HOST variable. :)

        >I left it doing the hostname lookup everytime
        >simply because every sane web server administrator
        >has hostname lookups for access turned off,
        >leaving that variable empty.

        Well, I guess sanity is in the eye of the beholder! :)

        A question about this REMOTE_HOST stuff--generally, server admins disable the hostname translation because it slows down the server, correct? I agree that it's the "sane" thing to do--but in this case, aren't you sort of defeating the purpose by doing the hostname translation yourself?

        The whole point, I thought, was to disable runtime hostname translation, because it slows things down; you seem to agree, so why re-introduce this step? Why not just log IP addresses, then run a cron job later as part of your stat analysis to do the hostname translation?

        Just something to think about. If I'm missing the main issues, let me know.

RE: Poor Man's Web Logger
by providencia (Pilgrim) on Apr 11, 2000 at 19:42 UTC
    I can't seem to get this script to work.
    I'm a newbie to perl and programming so this could explain alot.
    What tag do I put in the html?
    The tag below doesn't make sense to me.
    <img src="/cgi-bin/showpic/path_to_pic_in_document_root/pic.jpg">
    Let's say my image is /home/username/www/cgi-bin/showpic/spacer.gif
    I'm confused.
    I know I'm beating myself up somewhere.
    At what point am I making this harder than it needs to be?
      The idea behind this is that you typically already have images on your web page; so you replace the SRC of one of the images with a URL that routes through this logger, but still returns the image. So the browser doesn't know the difference, but the server does.

      So what you should actually do is just use one of the images you already have on your site. You say that the image you want to use is at /home/username/www/cgi-bin/showpic/spacer.gif? That's not right. "showpic" is the name of the CGI script, so it can't also be a directory on the filesystem.

      Let's say that you currently have an image like this on your web page:

      <img src="/images/foo.gif">
      And the actual filesystem location of this image is:
      /home/bar/www/images/foo.gif
      You would replace the IMG tag with this:
      <img src="/cgi-bin/showpic/images/foo.gif">
      And, in showpic (the CGI script), you'd set the $home variable to:
      my $home = "/home/bar/www";
      Make sense?

      After reading your problem, the $home variable really serves to save you from giving yourself carpal tunnel syndrome from typing out the full path.

      For a moment, let's say you keep all your images to your site in the filesystem path /home/username/www/images/. Your web-based path to the script might be /cgi-bin/showpic. Now you want to log who is visiting each of your pages using this script without putting the same image on every page.

      You could then set $home equal to '/home/username/www/images'. Then for each image that you want to have generate an entry in the log, you would set the src to '/cgi-bin/showpic/myHead.jpg' or '/cgi-bin/showpic/myHouse.jpg' or whatever.

      However, if you have images all over your filesystem, you might set $home to '/home/username/www'. Then you image src could be '/cgi-bin/showpic/images/myHead.jpg' or '/cgi-bin/showpic/britain/bigben.jpg' or any other image you have on a page you want to log.

Okay. Problem solved.
by providencia (Pilgrim) on Apr 12, 2000 at 06:40 UTC

    After talking to someone else I found out that our(I work for a *non-profit ISP)system isn't normal. It was intentionally setup this way so users wouldn't crowd our lowly disk space with logs. It also keeps people from wanting to run Mom and Pop webstores. It took some tweaking but it works now. The data is also formatted to look like an apache logfile. That way it is much easier to parse the logfile. I'm very appreciative of the program. Notice that it plays nice for Daylight Savings Time (I'm in the central time zone).

    I didn't personally make the changes. A friend did. At the time it was quicker for him to make the changes and for me to figure it out later than for me to break his train of thought as he taught me. :) * Not a non-profit "like" ISP. But a real Free-Net.

    Changes were made here: my %entries = ('home' => '/index.html$|/$');

    And here:

    my $zulu_offset = $isdst ? "-0500" : "-0600"; my $date = "[$mday/$month/$year:$hour:$min:$sec $zulu_offset]"; unless ($ENV{REMOTE_ADDR} =~ /^$ips/) { my $hostname; $hostname = gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}), AF_INE +T) or $hostname = $ENV{REMOTE_ADDR}; my $data = "$hostname - - $date"; open LOG, ">>$logfile" or last; flock LOG, 2; foreach (keys %entries) { print LOG "$data $_\n" if ($ENV{HTTP_REFERER} =~ /$entries{$ +_}/); } close LOG; }

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://6953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (15)
As of 2014-09-23 12:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (220 votes), past polls