http://www.perlmonks.org?node_id=671757


in reply to Re: web statistics
in thread web statistics

I second the recommendation of AWStats, which I prefer over webalizer.

It deserves mention that the stats available on the HTTP server are absolutely NOT accurate, and interpretation of them is very subjective. While these statistics packages can give you an indication of what's going on, it's a fuzzy indication at best. (Google Analytics is also quite fuzzy, so this caveat doesn't imply an endorsement of Google Analytics.)

Replies are listed 'Best First'.
Re^3: web statistics
by dsheroh (Monsignor) on Mar 04, 2008 at 17:00 UTC
    Define "absolutely NOT accurate". Barring server misconfiguration, disk/filesystem failure, or deliberate tampering with the logs, I don't see any way that the web server logs can be any less than absolute in their accuracy regarding which pages were served, at what time, and to which IP addresses.

    Referrers and user agents are the only things I can think of off the top of my head which go into my logs and are susceptible to spoofing by users1 and those shouldn't significantly affect the accuracy of log-based analysis.

    Absolutely agreed that it's all in the interpretation, though.

    1 OK, technically users could spoof their IP address as well, but that's a relatively sophisticated technique and they're not going to be able to see the returned page if they do it, so I'm comfortable with ignoring them for these purposes.

      Okay...

      "absolutely NOT accurate" is intended in this case to mean that HTTP server logs do not accurately communicate the specific information (e.g. "Unique vistors and number of hits per day") that the presumably pointy-haired boss has asked advait to provide.

      Your HTTP server doesn't serve every page view, as there are caching proxy servers out there on the internets. Your HTTP server doesn't know my Firefox browser with HTTP_USER_AGENT suppressed is a web browser with a human behind it -- it could just as easily be a bot and is likely to be categorized as such by most heuristic methods. Your HTTP server doesn't know if hits from the TOR network are initiated by one user or one hundred users. In short, your HTTP server knows only what is requested, how it is requested, when it is requested, and by which IP address. It doesn't know unique visitors, (human vision) page views, or any of that other stuff that makes marketdroids drool.

      Thus, in context, my caveat stands.