#! /usr/bin/perl -w
#
# david landgren 14-may-2001
use strict;
my $first = shift or die "No first (current) file specified on comman
+d line.\n";
my $second = shift or die "No second (previous) file specified on comm
+and line.\n";
my %site;
open IN, $first or die "Cannot open $first for input: $!\n";
while( <IN> ) {
chomp;
my @fields = split;
$site{ $fields[-1] } = \@fields;
}
close IN;
open IN, $second or die "Cannot open $second for input: $!\n";
while( <IN> ) {
chomp;
my ($rank, @fields) = split;
local $" = "\t";
if( defined $site{$fields[-1]} ) {
my $prev = $site{ $fields[-1] }->[0];
my $diff = $prev - $rank;
my $desc = 0 == $diff ? '=' : $diff < 0 ? $diff : "+$d
+iff";
print "$rank\t$prev\t$desc\t@fields\n";
}
else {
print "$rank\t-\tnew\t@fields\n";
}
}
close IN;
=head1 NAME
topwebdiff -- analyse the output of successive runs of topweb
=head1 SYNOPSIS
B<topwebdiff> filespec.recent filespec.older
=head1 DESCRIPTION
Take the output of two runs of topweb, and create a report that shows
+how
sites have evolved between the two snapshots. This helps pinpoint site
+s
that suddenly suck up a dramatic amount of bandwidth.
=head1 EXAMPLES
C<topwebdiff tw.yyyymmd1 tw.yyyymmd2>
The output is equivalent to the output of C<topweb tw.yyyymmd1> with t
+he
addition of two columns in the second and third place:
=item *
rank 2 -- the rank of the same FQDN from the file tw.yyyymmd2, or '--'
+ if
the FQDN does not appear in the second file.
delta -- the change in rank from the second file (the older snapshot)
+in
comparison with the first file (the newer snapshot).
An excerpt of the output from a sample data set is as follows. In this
example we see a site has jumped from 55th most visited site (in terms
+ of
bytes transferred) to 27th.
20 21 +1 5671 29919621 0.483% 25.064% www.voyages-sncf.com
21 20 -1 3532 27930698 0.451% 25.514% www.jobpilot.fr
22 24 +2 11842 27849740 0.449% 25.964% www.societe.com
23 22 -1 1807 25851714 0.417% 26.381% pub21.ezboard.com
24 23 -1 4560 24280781 0.392% 26.773% www.google.fr
25 26 +1 5326 24055482 0.388% 27.161% www.wanadoo.fr
26 27 +1 3075 23879164 0.385% 27.546% perso.wanadoo.fr
27 55 +28 3943 199970 28 30 +2 2313 19803044 0.320% 28.188% webm
+ail.libertysurf.fr
29 25 -4 1446 19699499 0.318% 28.506% www.geocities.com
30 28 -2 998 19288520 0.311% 28.817% lw10fd.law10.hotmail.msn.co
+m
Just how important this jump has to be weighed up with the number of f
+ile used
in generating the snapshot. In this instance, Squid is configured to r
+oll its
logs over every 24 hours, and 10 logs are kept. This means that the ou
+tput from
topweb (if run on all log files) will be a rolling 10-day average.
=head1 COPYRIGHT
Copyright (c) 2001 David Landgren.
This script is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
=head1 AUTHOR
David "grinder" Landgren
grinder on perlmonks (http://www.perlmonks.org/)
eval {join chr(64) => qw[landgren bpinet.com]}
=cut
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|