#! /usr/bin/perl -w
#
# david landgren 14-may-2001
use strict;
my $first = shift or die "No first (current) file specified on comman
+d line.\n";
my $second = shift or die "No second (previous) file specified on comm
+and line.\n";
my %site;
open IN, $first or die "Cannot open $first for input: $!\n";
while( <IN> ) {
chomp;
my @fields = split;
$site{ $fields[-1] } = \@fields;
}
close IN;
open IN, $second or die "Cannot open $second for input: $!\n";
while( <IN> ) {
chomp;
my ($rank, @fields) = split;
local $" = "\t";
if( defined $site{$fields[-1]} ) {
my $prev = $site{ $fields[-1] }->[0];
my $diff = $prev - $rank;
my $desc = 0 == $diff ? '=' : $diff < 0 ? $diff : "+$d
+iff";
print "$rank\t$prev\t$desc\t@fields\n";
}
else {
print "$rank\t-\tnew\t@fields\n";
}
}
close IN;
=head1 NAME
topwebdiff -- analyse the output of successive runs of topweb
=head1 SYNOPSIS
B<topwebdiff> filespec.recent filespec.older
=head1 DESCRIPTION
Take the output of two runs of topweb, and create a report that shows
+how
sites have evolved between the two snapshots. This helps pinpoint site
+s
that suddenly suck up a dramatic amount of bandwidth.
=head1 EXAMPLES
C<topwebdiff tw.yyyymmd1 tw.yyyymmd2>
The output is equivalent to the output of C<topweb tw.yyyymmd1> with t
+he
addition of two columns in the second and third place:
=item *
rank 2 -- the rank of the same FQDN from the file tw.yyyymmd2, or '--'
+ if
the FQDN does not appear in the second file.
delta -- the change in rank from the second file (the older snapshot)
+in
comparison with the first file (the newer snapshot).
An excerpt of the output from a sample data set is as follows. In this
example we see a site has jumped from 55th most visited site (in terms
+ of
bytes transferred) to 27th.
20 21 +1 5671 29919621 0.483% 25.064% www.voyages-sncf.com
21 20 -1 3532 27930698 0.451% 25.514% www.jobpilot.fr
22 24 +2 11842 27849740 0.449% 25.964% www.societe.com
23 22 -1 1807 25851714 0.417% 26.381% pub21.ezboard.com
24 23 -1 4560 24280781 0.392% 26.773% www.google.fr
25 26 +1 5326 24055482 0.388% 27.161% www.wanadoo.fr
26 27 +1 3075 23879164 0.385% 27.546% perso.wanadoo.fr
27 55 +28 3943 199970 28 30 +2 2313 19803044 0.320% 28.188% webm
+ail.libertysurf.fr
29 25 -4 1446 19699499 0.318% 28.506% www.geocities.com
30 28 -2 998 19288520 0.311% 28.817% lw10fd.law10.hotmail.msn.co
+m
Just how important this jump has to be weighed up with the number of f
+ile used
in generating the snapshot. In this instance, Squid is configured to r
+oll its
logs over every 24 hours, and 10 logs are kept. This means that the ou
+tput from
topweb (if run on all log files) will be a rolling 10-day average.
=head1 COPYRIGHT
Copyright (c) 2001 David Landgren.
This script is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=head1 AUTHOR
David "grinder" Landgren
grinder on perlmonks (http://www.perlmonks.org/)
eval {join chr(64) => qw[landgren bpinet.com]}
=cut
|