Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

topweb - Squid access.log analyser

by grinder (Bishop)
on Sep 14, 2001 at 12:19 UTC ( #112377=sourcecode: print w/ replies, xml ) Need Help??

Category: web stuff
Author/Contact Info grinder on perlmonks
Description: I've had a look a number of analysis tools for Squid access logs, but I didn't find anything simple that met my needs -- I just wanted to know how much direct web traffic was pulled down from what sites.

See also topwebdiff - analyse the output of topweb.
#! /usr/bin/perl -w
#
# david landgren  24-apr-2001

use strict;

my %domain;
my $total_size;
foreach my $file( @ARGV ) {
    open F, $file or die "Cannot open $file for input: $!\n";
    while( <F> ) {

        chomp;
        my( $size, $command ) = (split)[4,8];
        if( my( $dom ) = ( $command =~ /^DIRECT\/(.*)/ )) {
            $total_size += $size;
            $domain{$dom}{SIZE} += $size;
            $domain{$dom}{HITS}++;
        }
    }
    close F;
}

my $count;
my $cum_percent = 0;
foreach my $d ( sort {$domain{$b}{SIZE} <=> $domain{$a}{SIZE}} keys %d
+omain ) {
    ++$count;
    $cum_percent += (my $percent = $domain{$d}{SIZE}*100/$total_size);
    my $percent_rounded     = sprintf '%0.3f%%', $percent;
    my $cum_percent_rounded = sprintf '%0.3f%%', $cum_percent;
    print "$count\t$domain{$d}{HITS}\t$domain{$d}{SIZE}\t$percent_roun
+ded\t$cum_percent_ro
unded\t$d\n";
}

=head1 NAME

topweb - Determine biggest targets of inbound HTTP traffic

=head1 SYNOPSIS

B<topweb> filespec [filespec...]

=head1 DESCRIPTION

Generate a snapshot of direct web traffic recorded by a Squid proxy.

Scan the Squid access logs specified on the command line looking for D
+IRECT connections
Accumulate the number of hits and and bytes transferred for each FQDN.
+ Sort and print
the results based on bytes transferred. The goal is to see how much re
+al traffic is
coming in due to cache misses.

=head1 OUTPUT

This program outputs a tab-delimited text file. The fields are as foll
+ows

=item *
rank -- from 1 to n, the rank in terms of bytes transferred for the do
+main.

=item *
hits -- the number of seperate transfers logged.

=item *
bytes -- the total number of bytes transferred from the above hits.

=item *
percent -- the percentage that this site represents in terms of the to
+tal traffic.

=item *
cumulative percent -- the percentage that this site and all busier sit
+es represent
in terms of the total traffic.

=item *
fqdn -- the fully qualified domain name of the host, or numeric IP add
+ress if the
address does not resolve.

Here is an a sample output, which indicates, among other things, that 
+the four
most demanded sites in this data sample represent 10% of incoming traf
+fic:

1       25226   106606531       2.877%  2.877%  www.cadremploi.fr
2       15996   104380579       2.817%  5.693%  mailv2.voila.fr
3       24842   97149410        2.621%  8.315%  www.apec.asso.fr
4       16861   81954034        2.211%  10.526% www.voila.fr

=head1 EXAMPLES

C</usr/local/bin/topweb /home/squid/logs/access.log* | head -25>

C</usr/local/bin/topweb /home/squid/logs/access.log* E<gt>topweb.yyyym
+mdd>

=head1 SEE ALSO

topwebdiff - A report tool to analyse the day to day changes of the ou
+tput from topweb.

=head1 COPYRIGHT

Copyright (c) 2001 David Landgren.

This script is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

=head1 AUTHOR

     David "grinder" Landgren
     grinder on perlmonks (http://www.perlmonks.org/)
     eval {join chr(64) => qw[landgren bpinet.com]}

=cut


Comment on topweb - Squid access.log analyser
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://112377]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2015-07-05 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls