Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

RE: Poor Man's Web Logger

by turnstep (Parson)
on Apr 06, 2000 at 00:14 UTC ( #6963=note: print w/ replies, xml ) Need Help??


in reply to Poor Man's Web Logger

Nice script! A few notes:

  • For better portablity, consider making the "my ISP" IP numbers into variables that can be easily changed at the top, like $home and $logfile are.
  • Use the $^T variable instead of calling 'time'
  • You could also just say $date=scalar localtime; :)
  • Put the gethostbyaddr line before the open and flock. That way, the data can be dumped in very quickly and the file closed again. Even better, write all the data into one variable and have a single print LOG "$data\n"; line between the lock and the close.
  • If you can't get to your access logs, the die statement at the end of 'open PIC' does not really help much. On a similar note, you should check the return value of OPEN in 'open LOG' and bypass the rest of the loop if it fails.
  • Some servers will report REMOTE_HOST as well as REMOTE_ADDR. If your server is lucky enough to do that, you can remove all the socket stuff!
  • Put the "Content-type" line at the start of the script, so the browser knows as early as possible that it is receiving an image.
  • You should also tell the browser the size, with a Content-Length header. This info is easily grabbed with -s $pic
  • For even better speed, consider hard-coding the image into the script itself. A small gif can be squeezed as small as 35 bytes!
  • Output the gif, then do your log file. Not only does the client not have to wait for the log file writing, but you can simply die if 'open LOG' or 'flock LOG' does not work.
  • Using all of the above, I can squeeze it down to a 6 line script!


Comment on RE: Poor Man's Web Logger
RE: RE: Poor Man's Web Logger
by comatose (Monk) on Apr 06, 2000 at 21:53 UTC

    Thanks for the good feedback. I took some of your comments and incorporated them into a new version and expanded on some ideas as well.

    For example, the reason I constructed the $date and $hostname into a single variable was so that I could put a different label on each entry. With the %entries hash, you can now setup different pattern matches and the corresponding log entry.

    I left it doing the hostname lookup everytime simply because every sane web server administrator has hostname lookups for access turned off, leaving that variable empty.

    If I were cruel, I'd have it send SERVER_ADMIN an email everytime it got a REMOTE_HOST variable. :)

      >I left it doing the hostname lookup everytime
      >simply because every sane web server administrator
      >has hostname lookups for access turned off,
      >leaving that variable empty.

      Well, I guess sanity is in the eye of the beholder! :)

      A question about this REMOTE_HOST stuff--generally, server admins disable the hostname translation because it slows down the server, correct? I agree that it's the "sane" thing to do--but in this case, aren't you sort of defeating the purpose by doing the hostname translation yourself?

      The whole point, I thought, was to disable runtime hostname translation, because it slows things down; you seem to agree, so why re-introduce this step? Why not just log IP addresses, then run a cron job later as part of your stat analysis to do the hostname translation?

      Just something to think about. If I'm missing the main issues, let me know.

        What you say is true to a certain extent. However, this logger is for when you don't have access to regular log files. My dialup ISP actually lets me do CGI in my home directory but doesn't let me have access to log files.

        And since my site gets about 4 or 5 visitors a day, the system hit is almost nil. If I were getting a visitor a minute, I might change it. Also as it is, it's only doing one lookup per page. That's a lot less than the standard number of lookups for a full page (html plus images).

        I think in this case, the server parses more than just his page, and more than just his domain, perhaps. This is just his way of "turning it back on" even though the sysadmin for the server that hosts his site has it turned off. Just a guess.

        I have sites that have it both ways. btrott has a good point, however: doing it later would also allow you to cache the answers, freeing some (perhaps a lot) of calls to gethostbyaddr. Unless it is a really, really busy page, however, it probably does not hurt too much to look it up each time.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://6963]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-12-26 20:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls