Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

(code) HTTP connectivity log, or nail up dial connection

by ybiC (Prior)
on Apr 23, 2002 at 01:08 UTC ( [id://161189]=sourcecode: print w/replies, xml ) Need Help??
Category: Networking Code? Web stuff?
Author/Contact Info ybiC
Description:

Fetch a random web page from a list of URLs.   Waits a random length of time between fetches.

Written with two purposes in mind:
  * log internet connectivity
  * nail up analog dial connection
Mostly the latter, so my ISP won't drop my connection during looong downloads.

There are a number of commandline options+arguments.   Perhaps the more interesting are

  • --verbose=1 which reports only delta of success/fail ala zeno from Internet Connection Uptime Logger
  • --errDelay=1 to cause almost-immediate retry on fetch failure
  • --daemonize which backgrounds and disconnects so vty can be closed
  • --logging to print progress, options, and versions to file

From a Perlish perspective, this has been an exercise in rand, IO::Tee and Proc::Daemon, plus Getopt::Long bobs that I ended up not using.   I started out "use"ing Proc::Daemon, but had to copy+tweak subs from it to allow logging-to-file while daemonized.

Thanks to tye, fletch, tachyon, lemming, chmrr and others for tips and suggestions.   And as always, constructive criticism is welcome and invited.

#!/usr/bin/perl -w

# nailup.pl
# pod at tail


$|++;                    # STDOUT hot
require 5;               # support following modules
use strict;              # avoid d'oh! bugs
use Getopt::Long;        # support commandline args, opts
use Pod::Usage;          # avoid redundant &Usage()
use LWP::UserAgent;      # virtual web client
use Carp;                # support daemonization option
use integer;             # whole numbers in calc outputs

my $nailup_VER  = '0.99.48';


######################################################################
+#####
# BEGIN CONFIG PARAMS
######################################################################
+#####
my $logFile       = './nailup.log';
my $errFile       = './nailup.err';
my $analogPidFile = '/var/run/ppp0.pid';
my $browser       =
  'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc1) Gecko/20020417';
my @sitelist = qw(
  http://www.whitehouse.gov
  http://www.microsoft.com
  http://www.crosswalk.com
  http://www.slashdot.org
  http://www.gateway.com
  http://www.google.com
  http://www.debian.org
  http://www.compaq.com
  http://www.novell.com
  http://www.msnbc.com
  http://java.sun.com
  http://www.sony.com
  http://www.dell.com
  http://www.nba.com
  http://www.ibm.com
  http://www.hp.com
);
######################################################################
+#####
# END CONFIG PARAMS
######################################################################
+#####


# Default values and other stuff not to be mucked with:
my $current_run = 0;


# Optional features, so optional modules:
my $havePOSIX;
my $haveIOTee;
BEGIN {
  # facilitate option to run as background process
  $havePOSIX = 0;
  eval { require POSIX };
  unless ($@) {
    POSIX->import();
    $havePOSIX = 1;
  }
  # facilitate option to log to file
  $haveIOTee = 0;
  eval { require IO::Tee };
  unless ($@) {
    IO::Tee->import();
    $haveIOTee = 1;
  }
}


# Setup options and arguments:
my $opt_verbose       = my $def_verbose       = 2;
my $opt_biggestDelay  = my $def_biggestDelay  = 900;
my $opt_smallestDelay = my $def_smallestDelay = 300;
my $opt_errDelay      = my $def_errDelay      = 1;
my $opt_runs          = my $def_runs          = 3;
my $opt_timeout       = my $def_timeout       = 120;

my (
  $opt_help,
  $opt_man,
  $opt_quiet,
  $opt_daemonize,
  $opt_analog,
  $opt_logging,
);
GetOptions(
  'help!'           => \$opt_help,
  'man!'            => \$opt_man,
  'verbose=i'       => \$opt_verbose,
  'quiet!'          => \$opt_quiet,
  'daemonize!'      => \$opt_daemonize,
  'runs=i'          => \$opt_runs,
  'biggestDelay=i'  => \$opt_biggestDelay,
  'smallestDelay=i' => \$opt_smallestDelay,
  'errDelay=i'      => \$opt_errDelay,
  'timeout=i'       => \$opt_timeout,
  'analog!'         => \$opt_analog,

  'logging!'        => \$opt_logging,

) or pod2usage(verbose => 1) && exit;


######################################################################
+#####
# BEGIN PROCESS OPTIONS, ARGUMENTS
######################################################################
+#####
#   order of following args/opts is important!
#   test well after any change or invite unexpected behavior!
#   and update the pod too!
#   you've been warned!
if ($opt_help)                           { pod2usage(verbose => 1); ex
+it; }
if ($opt_man)                            { pod2usage(verbose => 2); ex
+it; }
if ($opt_runs < -1)                      { pod2usage(verbose => 1); ex
+it; }
if ($opt_biggestDelay<$opt_smallestDelay){ pod2usage(verbose => 1); ex
+it; }
if ($opt_errDelay < 0)                   { pod2usage(verbose => 1); ex
+it; }
if ($opt_timeout < 1)                    { pod2usage(verbose => 1); ex
+it; }
if ($opt_quiet)                          { $opt_verbose = 0; }

my $out;
if ($opt_logging){
  unless ($haveIOTee == 1){
    print "\nSorry - logging requires CPAN module IO::Tee\n\n";
    exit;
  }
  if ($opt_verbose < 1){
    print "\nSorry - logging not allowed in quiet/silent mode\n\n";
    exit;
  }
  $out = new IO::Tee( \*STDOUT, new IO::File("> $logFile"),);
  new IO::Tee( \*STDERR, new IO::File("> $errFile"),);
  print $out
    "\n",
    "  Logging to $logFile\n",
    "  Catastrophic errors to $errFile\n",
    "\n",
    ;
  $out->flush if $opt_logging;
}else{ $out = \*STDOUT;}

if ($opt_analog){
  unless (-e $analogPidFile){
    print $out
      "\n",
      "You specified dial connection, but no PID file was found.\n",
      "Are you on a *nix host?\n",
      "Are you using an analog dial connection?\n",
      "Is your dialup connection established?\n",
      "\n",
      ;
    $out->flush if $opt_logging;
    exit;
  }
}

if($opt_verbose >= 3) {
  # singular/plural second(s) appropriately:
  my %plural = ( b => 's', s => 's', e => 's', t => 's',);
  $plural{b} = '' if $opt_biggestDelay  == 1;
  $plural{s} = '' if $opt_smallestDelay == 1;
  $plural{e} = '' if $opt_errDelay      == 1;
  $plural{t} = '' if $opt_timeout       == 1;

  my %pluralDef = ( b => 's', s => 's', e => 's', t => 's',);
  $pluralDef{b} = '' if $def_biggestDelay  == 1;
  $pluralDef{s} = '' if $def_smallestDelay == 1;
  $pluralDef{e} = '' if $def_errDelay      == 1;
  $pluralDef{t} = '' if $def_timeout       == 1;
  print $out <<EOF

$0

Current values
  number of runs      $opt_runs
  biggest delay       $opt_biggestDelay second$plural{b}
  smallest delay      $opt_smallestDelay second$plural{s}
  error delay         $opt_errDelay second$plural{e}
  timeout             $opt_timeout second$plural{t}

Defaults
  number of runs      $def_runs
  biggest delay       $def_biggestDelay second$pluralDef{b}
  smallest delay      $def_smallestDelay second$pluralDef{s}
  error delay         $def_errDelay second$pluralDef{e}
  timeout             $def_timeout second$pluralDef{t}

Versions
  LWP::UserAgent      $LWP::UserAgent::VERSION
  IO::Tee             $IO::Tee::VERSION
  POSIX               $POSIX::VERSION
  Pod::Usage          $Pod::Usage::VERSION
  Getopt::Long        $Getopt::Long::VERSION
  Perl                $]
  OS                  $^O
  $0    $nailup_VER

EOF
;
$out->flush if $opt_logging;
}

if ($opt_daemonize){ Daemonize(); }

######################################################################
+#####
# END PROCESS OPTIONS, ARGUMENTS
######################################################################
+#####


# first attempt logged regardless of success/failure
# success = 1, failure = 0
my $oldSuccess = 3;

# Call the Main Chunk o'Code:
# (run indefinately, or run specified number of times)
if ($opt_runs == -1) { randomFetch() while(1) }
if ($opt_runs  > -1) { randomFetch() for(1..$opt_runs) }
print $out "\n";


######################################################################
+#####
# Main Chunk o'Code:
#  Fetch random page then wait to doit again
#   yeah yeah, it's a wee bit looooong for a sub,
#   but is only reasonable way I could find to
#   support selecting betwixt while(...) and for(...)
######################################################################
+#####
sub randomFetch {
  # randomly choose page:
  for (my $i = @sitelist; -- $i;) {
    my $r = int rand (1 + $i);
    @sitelist [$i, $r] = @sitelist [$r, $i] unless $r == $i;
  }
  my $site = $sitelist[0];

  # try to fetch da page:
  my $hdrs = new HTTP::Headers(
    Accept     => 'text/HTML',
    user_agent => $browser,
  );
  my $url  = new URI($site);
  my $req  = new HTTP::Request('GET', $url, $hdrs);
  my $ua   = new LWP::UserAgent;
  # $ua->proxy('http', "http://host.dom.tld:port");
  $ua->timeout($opt_timeout);
  my $resp = $ua->request($req);

  # check da response:
  my $success = 1 if $resp->is_success;
  $success    = 0 if $resp->is_error;
  my $delay;
  ++$current_run;
  my $randomDelay = 
    rand($opt_biggestDelay-$opt_smallestDelay)+$opt_smallestDelay;

  # human-readable date+timestamp:
  my ($sec, $min, $hour, $mday, $mon, $year) = localtime(time);
  my $stamp = sprintf(
    "%04d-%02d-%02d %02d:%02d:%02d",
    $year+1900, $mon+1, $mday, $hour, $min, $sec
  );

  # and report da response:
  # tell a lot:
  if ($opt_verbose >= 2){
    if ($success == 1){
      print $out "  $stamp  $site  $success  ";
      if ($current_run < $opt_runs or $opt_runs == -1){
        print $out "Next loop in $randomDelay second";
        print $out 's' if $randomDelay != 1;
        print $out ".\n";
      }else{ print $out "\nRun complete.\n\n"; }
    }
    if ($success == 0){
      print $out "\nError  $stamp  $site\n", $resp->message, "\n";
      if ($current_run < $opt_runs or $opt_runs == -1){
        print $out "Next loop in $opt_errDelay second";
        print $out 's' if $opt_errDelay != 1;
        print $out ".\n\n";
      }else{ print $out "\nRun complete.\n\n"; }
    }
    $out->flush if $opt_logging;
  }
  # tell a little:
  if($opt_verbose == 1){
    if ($success != $oldSuccess){
      print $out "  $stamp  $site   $success\n";
      $out->flush if $opt_logging;
    }
  }
  # cone of silence:
  if($opt_verbose == 0){ }

  # how long can this go on?
  if ($current_run < $opt_runs or $opt_runs == -1){
    $oldSuccess = $success;
    sleep $opt_errDelay if $success == 0;
    sleep $randomDelay  if $success == 1;
  }
}

######################################################################
+#####
# Background and disconnect from controlling vty, so vty can be closed
+:
#  fork() backgrounds, but doesn't disconnect from controlling vty
#    hence stolen Proc::Daemon subs
#  Unix-family OSen only
#  $^O strings known to me:
#    aix      freebsd  netbsd   openbsd
#    cygwin   darwin   hpux     irix
#    linux    sco      solaris  sunos
#    dec_osf  macos    msdos    mswin32  os390
######################################################################
+#####
sub Daemonize {
  if ($^O =~ /(MSWin32|MSDOS|MacOS)/i){
    print $out
      "\nSorry, --daemon only works on *nix hosts.  ",
      "You appear to be running $^O.\n\n",
    ;
    $out->flush if $opt_logging;
    exit;
  }

  if ($havePOSIX != 1){
    print $out "\nSorry, --daemon requires the CPAN POSIX module.\n\n"
+;
    $out->flush if $opt_logging;
    exit;
  }
  
  if ($opt_verbose > 0){
    ### BUG ###
    ### next line prints output thrice
    ### print $out <<EOF
    print <<EOF

$0 launched and daemonizing.
Something like 'ps x|grep nailup.pl' for PID to kill program.

EOF
    ;
  }
  Init();
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Try to fork if at all possible.
#   Function will croak if unable to fork.
sub Fork {
  my($pid);
  FORK: {
    if (defined($pid = fork)) {

      return $pid;
    } elsif ($! =~ /No more process/) {
      sleep 5;
      redo FORK;
    } else { croak "Can't fork: $!"; }
  }
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Return the maximum number of possible file descriptors.
#   If sysconf() does not give us value, we punt with our own value.
sub OpenMax {
    my $openmax = POSIX::sysconf( &POSIX::_SC_OPEN_MAX );
    (!defined($openmax) || $openmax < 0) ? 64 : $openmax;
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Become a daemon.
sub Init {
  my $oldmode = shift || 0;
  my($pid, $sess_id, $i);

  ## Fork and exit parent
  if ($pid = Fork) { exit 0; }

  ## Detach ourselves from the terminal
  croak "Cannot detach from controlling terminal"
    unless $sess_id = POSIX::setsid();

  ## Prevent possibility of acquiring a controling terminal
  if (!$oldmode) {
    $SIG{'HUP'} = 'IGNORE';
    if ($pid = Fork) { exit 0; }
  }

  ### ybiC
  ### --logging
  # Change working directory
  # chdir "/";

  ## Clear file creation mask
  umask 0;

  ### ybiC
  ### --logging
  # Close open file descriptors
  # foreach $i (0 .. OpenMax) { POSIX::close($i); }

  ## Reopen stderr, stdout, stdin to /dev/null
  open(STDIN,  "+>/dev/null");
  open(STDOUT, "+>&STDIN");
  open(STDERR, "+>&STDIN");

  $oldmode ? $sess_id : 0;
}
### ybiC
### Name "main::init" used only once. Possible typo at line...
# *init = \&Init;

######################################################################
+#####


=head1 TITLE

nailup.pl

=head1 SYNOPSIS

 nailup.pl [options arguments]

=head1 DESCRIPTION

Randomly fetch a remote web page from a list of URLs.
Waits a random length of time between fetches.
Optionally log results to file, daemonize, and/or check for ppp0 PID f
+ile.

 Written with two purposes in mind:
 * log internet connectivity
 * nail up analog dial connection

=head1 ARGUMENTS

 -r=10   --runs=10           numbers of times to fetch a random page
                                                   (-1 means indefinat
+ely)
 -b=500  --biggestDelay=500  biggest wait between fetches        (seco
+nds)
 -s=60   --smallestDelay=60  smallest wait between fetches       (seco
+nds)
 -e=1    --errDelay=1        wait after failed fetch             (seco
+nds)
 -t=30   --timeout=30        wait for response this long         (seco
+nds)
 -v=0    --verbose=0  silent
                   1  $timestamp  $site  $success (on success/fail cha
+nge)
                   2  $timestamp  $site  $success  'next in N seconds'
                   3  same as 2 plus current+default params at head

                                                  (default verbosity i
+s 2)

 Longest delay must be greater than or equal to smallest delay.
 Any value entered must be a positive integer (except for --runs).
 All are optional, as default values exist.

=head1 OPTIONS

 -h   --help     brief help message          (overrides all options be
+low)
 -m   --man      full documentation          (overrides all options be
+low)
 -q   --quiet    silent; equivelent to -v=0        (overrides --verbos
+e=N)
 -d   --daemon   background+disconnect so can close vty w/o killing ru
+n
                           ('ps x|grep nailup.pl' or such for PID to k
+ill)
                                                        (UNIX-family o
+nly)
 -l   --logging  output to screen and logfile   (not allowed with --qu
+iet)
 -a   --analog   check for ppp0 PID file, abort if not present
                         (localhost using analog modem for net-connect
+ion)

Short options cannot be bundled.
For example, C<nailup.pl -adl> is *not* valid.
But you could do C<nailup.pl -a -d -l> for analog/daemon/log combinati
+on.

=head1 CONFIG PARAMS

The following parameters can be changed by editing in CONFIG PARAMS
near the top of the program code:

 $logFile
   filespec to write to if --logging
   default is ./nailup.log

 $errFile
   filespec to write catastropic errors to if --logging
   default is ./nailup.err

 $analogPidFile
   absolute path to ppp0 PID file for live dialup connection
   *must* be set if using --analog commandline option

 $browser
   What browser this program reports itself as to web servers
   Program will run fine with this unchanged

 @sitelist
   List of websites to fetch
   Can be specific page, or just site address
   Must include leading 'http://'
   Trailing slash optional
   Program will run fine with this unchanged

=head1 AUTHOR

ybiC

=head1 CREDITS

 Props to:
   tye and others for $fh tips,
   Fletch and tachyon for URI::URL tip,
   lemming for link to semi-complete $^O list,
     http://testers.cpan.org/search?request=by-config
   zeno for [id://57865] which inspired this code.
 Oh yeah, and to some guy named vroom

=head1 TESTED

 Perl 5.00503, Debian 2.2r6
 Full functionality
   Getopt::Long    2.01, 1.90
   Pod::Usage      1.14
   POSIX           1.02
   IO::Tee         0.64
   LWP::UserAgent  2.29, 2.262

 Perl 5.00601, Cygwin on Win2k
 Full functionality
   Getopt::Long    2.25
   Pod::Usage      1.14
   POSIX           1.03
   IO::Tee         0.64
   LWP::UserAgent  2.001

 Perl 5.00601, Win2k cmd.exe
 Daemonization not available
   Getopt::Long    2.25
   Pod::Usage      1.14
   POSIX           1.03
   IO::Tee         0.64
   LWP::UserAgent  1.77

=head1 BUGS

 --daemon initial message prints thrice if --logging and print $out
 Workaround fitted: print instead of print $out

 Daemonization works on Cygwin, but how to determine the PID ??

=head1 UPDATE

 2002-04-22   20:25 CDT
   Fix thai-pos
   Post to PerlMonks
   Test on Win32-ActivePerl and also Cygwin
   Enclose POSIX, IO::Tee in BEGIN{ eval... require... } blocks
     so can run on hosts without those modules
   --logging with --daemon
     copy+tweak code from Proc::Daemon v0.02
   Revise verbosity levels/results
   Option to specify analog dial connection
     (check for /var/run/ppp0.pid)

 2002-04-16   06:50 CDT
   Initial working code

=head1 TODO

 Squash above known bug(s) if any
 --sitelist arg to override default @sitelist
   trailing ' -- ' and what follows is passed to @ARGV
   sanity-check, untaint
 Enable bundling of short options eg; -lda
   Getopt::Long::Configure qw(bundling ignore_case_always);
   above syntax errs if long or short switches entered >8^(

=cut
Replies are listed 'Best First'.
Re: HTTP connectivity log, or nail up dial connection
by beebware (Pilgrim) on Apr 23, 2002 at 20:04 UTC
    What might be an idea to add a bit more "random element" to it, is actually parse the page you download and find out which links and images go to other remote sites. Download those images over 10/20 minutes or so, and then go to the links. This was at least the site you are downloading from may get a bit of money from any ads on the site: keeping it up and running... Just a "webmaster friendly" style thought...
    Download from somewhere like Yahoo or (quick plug)Beebware Directory(end plug) and then recursively parse it,and you'll have loads of 'random sites' to visit for hours on end: how will your ISP tell it's a bot and not you? You aren't visiting the same pages over and over again (which could be in their cache anyway)...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://161189]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-03-19 10:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found