http://www.perlmonks.org?node_id=671698


in reply to web statistics

Have a look at Google Analytics. You have to signup with them and then put some code on each of your web pages. This method can not go back in time to give you previous stats. On the plus side you don't need access to the server log files. It also has a bunch of features like a map indicating where traffic is coming from.

Another option is to run a program on the web server log files. A couple you can use are webalizer or awstats. This method can go back as far as the log files go back. To use it you need to install some software (it doesn't need to be installed on the web server but you need the log files).

Replies are listed 'Best First'.
Re^2: web statistics
by gloryhack (Deacon) on Mar 04, 2008 at 01:54 UTC
    I second the recommendation of AWStats, which I prefer over webalizer.

    It deserves mention that the stats available on the HTTP server are absolutely NOT accurate, and interpretation of them is very subjective. While these statistics packages can give you an indication of what's going on, it's a fuzzy indication at best. (Google Analytics is also quite fuzzy, so this caveat doesn't imply an endorsement of Google Analytics.)

      Define "absolutely NOT accurate". Barring server misconfiguration, disk/filesystem failure, or deliberate tampering with the logs, I don't see any way that the web server logs can be any less than absolute in their accuracy regarding which pages were served, at what time, and to which IP addresses.

      Referrers and user agents are the only things I can think of off the top of my head which go into my logs and are susceptible to spoofing by users1 and those shouldn't significantly affect the accuracy of log-based analysis.

      Absolutely agreed that it's all in the interpretation, though.

      1 OK, technically users could spoof their IP address as well, but that's a relatively sophisticated technique and they're not going to be able to see the returned page if they do it, so I'm comfortable with ignoring them for these purposes.

        Okay...

        "absolutely NOT accurate" is intended in this case to mean that HTTP server logs do not accurately communicate the specific information (e.g. "Unique vistors and number of hits per day") that the presumably pointy-haired boss has asked advait to provide.

        Your HTTP server doesn't serve every page view, as there are caching proxy servers out there on the internets. Your HTTP server doesn't know my Firefox browser with HTTP_USER_AGENT suppressed is a web browser with a human behind it -- it could just as easily be a bot and is likely to be categorized as such by most heuristic methods. Your HTTP server doesn't know if hits from the TOR network are initiated by one user or one hundred users. In short, your HTTP server knows only what is requested, how it is requested, when it is requested, and by which IP address. It doesn't know unique visitors, (human vision) page views, or any of that other stuff that makes marketdroids drool.

        Thus, in context, my caveat stands.

Re^2: web statistics
by proceng (Scribe) on Mar 04, 2008 at 03:38 UTC
    Another option is to run a program on the web server log files.
    and
    Have a look at Google Analytics
    If your visitor is like me, FF+NoScript are configured to deny Google Analytics (as well as *.doubleclick.net and others of their ilk). Your logs will give you a much better indication of your visitors without requiring the user to allow JS from other sites.
    My attitude on this is that if you want to track me as a visitor that is fine. If others wish to base their income model on something that does not benefit me, they have to stand in line. Therefore, if your site will not allow me to navigate without sending data to external (to your company) sites, you better have something I can't live without.
    Just my $0.01 (exchange rate is going to heck) ;-)