Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

topwebdiff - analyse the output of topweb

by grinder (Bishop)
on Sep 14, 2001 at 12:26 UTC ( #112378=sourcecode: print w/ replies, xml ) Need Help??

Category: web stuff
Author/Contact Info grinder on perlmonks
Description: To make the best use of topweb snapshots, the idea is to generate the files day by day, and then run topwebdiff to pinpoint the ranking changes.

See also topweb - Squid access.log analyser.
#! /usr/bin/perl -w
#
# david landgren  14-may-2001

use strict;

my $first  = shift or die "No first (current) file specified on comman
+d line.\n";
my $second = shift or die "No second (previous) file specified on comm
+and line.\n";

my %site;

open IN, $first or die "Cannot open $first for input: $!\n";
while( <IN> ) {
        chomp;
        my @fields = split;
        $site{ $fields[-1] } = \@fields;
}
close IN;

open IN, $second or die "Cannot open $second for input: $!\n";
while( <IN> ) {
        chomp;
        my ($rank, @fields) = split;
        local $" = "\t";
        if( defined $site{$fields[-1]} ) {
                my $prev = $site{ $fields[-1] }->[0];
                my $diff = $prev - $rank;
                my $desc = 0 == $diff ? '=' : $diff < 0 ? $diff : "+$d
+iff";
                print "$rank\t$prev\t$desc\t@fields\n";
        }
        else {
                print "$rank\t-\tnew\t@fields\n";
        }
}
close IN;

=head1 NAME

topwebdiff -- analyse the output of successive runs of topweb

=head1 SYNOPSIS

B<topwebdiff> filespec.recent filespec.older

=head1 DESCRIPTION

Take the output of two runs of topweb, and create a report that shows 
+how
sites have evolved between the two snapshots. This helps pinpoint site
+s
that suddenly suck up a dramatic amount of bandwidth.

=head1 EXAMPLES

C<topwebdiff tw.yyyymmd1 tw.yyyymmd2>

The output is equivalent to the output of C<topweb tw.yyyymmd1> with t
+he
addition of two columns in the second and third place:

=item *
rank 2 -- the rank of the same FQDN from the file tw.yyyymmd2, or '--'
+ if
the FQDN does not appear in the second file.

delta -- the change in rank from the second file (the older snapshot) 
+in
comparison with the first file (the newer snapshot).

An excerpt of the output from a sample data set is as follows. In this
example we see a site has jumped from 55th most visited site (in terms
+ of
bytes transferred) to 27th.

 20 21 +1  5671  29919621  0.483%  25.064% www.voyages-sncf.com
 21 20 -1  3532  27930698  0.451%  25.514% www.jobpilot.fr
 22 24 +2  11842 27849740  0.449%  25.964% www.societe.com
 23 22 -1  1807  25851714  0.417%  26.381% pub21.ezboard.com
 24 23 -1  4560  24280781  0.392%  26.773% www.google.fr
 25 26 +1  5326  24055482  0.388%  27.161% www.wanadoo.fr
 26 27 +1  3075  23879164  0.385%  27.546% perso.wanadoo.fr
 27 55 +28 3943  199970 28 30 +2  2313  19803044  0.320%  28.188% webm
+ail.libertysurf.fr
 29 25 -4  1446  19699499  0.318%  28.506% www.geocities.com
 30 28 -2  998   19288520  0.311%  28.817% lw10fd.law10.hotmail.msn.co
+m

Just how important this jump has to be weighed up with the number of f
+ile used
in generating the snapshot. In this instance, Squid is configured to r
+oll its
logs over every 24 hours, and 10 logs are kept. This means that the ou
+tput from
topweb (if run on all log files) will be a rolling 10-day average.

=head1 COPYRIGHT

Copyright (c) 2001 David Landgren.

This script is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

=head1 AUTHOR

     David "grinder" Landgren
     grinder on perlmonks (http://www.perlmonks.org/)
     eval {join chr(64) => qw[landgren bpinet.com]}

=cut

Comment on topwebdiff - analyse the output of topweb
Download Code

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://112378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2014-12-21 14:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls