http://www.perlmonks.org?node_id=161189
Category: Networking Code? Web stuff?
Author/Contact Info ybiC
Description:

Fetch a random web page from a list of URLs.   Waits a random length of time between fetches.

Written with two purposes in mind:
  * log internet connectivity
  * nail up analog dial connection
Mostly the latter, so my ISP won't drop my connection during looong downloads.

There are a number of commandline options+arguments.   Perhaps the more interesting are

  • --verbose=1 which reports only delta of success/fail ala zeno from Internet Connection Uptime Logger
  • --errDelay=1 to cause almost-immediate retry on fetch failure
  • --daemonize which backgrounds and disconnects so vty can be closed
  • --logging to print progress, options, and versions to file

From a Perlish perspective, this has been an exercise in rand, IO::Tee and Proc::Daemon, plus Getopt::Long bobs that I ended up not using.   I started out "use"ing Proc::Daemon, but had to copy+tweak subs from it to allow logging-to-file while daemonized.

Thanks to tye, fletch, tachyon, lemming, chmrr and others for tips and suggestions.   And as always, constructive criticism is welcome and invited.

#!/usr/bin/perl -w

# nailup.pl
# pod at tail


$|++;                    # STDOUT hot
require 5;               # support following modules
use strict;              # avoid d'oh! bugs
use Getopt::Long;        # support commandline args, opts
use Pod::Usage;          # avoid redundant &Usage()
use LWP::UserAgent;      # virtual web client
use Carp;                # support daemonization option
use integer;             # whole numbers in calc outputs

my $nailup_VER  = '0.99.48';


######################################################################
+#####
# BEGIN CONFIG PARAMS
######################################################################
+#####
my $logFile       = './nailup.log';
my $errFile       = './nailup.err';
my $analogPidFile = '/var/run/ppp0.pid';
my $browser       =
  'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc1) Gecko/20020417';
my @sitelist = qw(
  http://www.whitehouse.gov
  http://www.microsoft.com
  http://www.crosswalk.com
  http://www.slashdot.org
  http://www.gateway.com
  http://www.google.com
  http://www.debian.org
  http://www.compaq.com
  http://www.novell.com
  http://www.msnbc.com
  http://java.sun.com
  http://www.sony.com
  http://www.dell.com
  http://www.nba.com
  http://www.ibm.com
  http://www.hp.com
);
######################################################################
+#####
# END CONFIG PARAMS
######################################################################
+#####


# Default values and other stuff not to be mucked with:
my $current_run = 0;


# Optional features, so optional modules:
my $havePOSIX;
my $haveIOTee;
BEGIN {
  # facilitate option to run as background process
  $havePOSIX = 0;
  eval { require POSIX };
  unless ($@) {
    POSIX->import();
    $havePOSIX = 1;
  }
  # facilitate option to log to file
  $haveIOTee = 0;
  eval { require IO::Tee };
  unless ($@) {
    IO::Tee->import();
    $haveIOTee = 1;
  }
}


# Setup options and arguments:
my $opt_verbose       = my $def_verbose       = 2;
my $opt_biggestDelay  = my $def_biggestDelay  = 900;
my $opt_smallestDelay = my $def_smallestDelay = 300;
my $opt_errDelay      = my $def_errDelay      = 1;
my $opt_runs          = my $def_runs          = 3;
my $opt_timeout       = my $def_timeout       = 120;

my (
  $opt_help,
  $opt_man,
  $opt_quiet,
  $opt_daemonize,
  $opt_analog,
  $opt_logging,
);
GetOptions(
  'help!'           => \$opt_help,
  'man!'            => \$opt_man,
  'verbose=i'       => \$opt_verbose,
  'quiet!'          => \$opt_quiet,
  'daemonize!'      => \$opt_daemonize,
  'runs=i'          => \$opt_runs,
  'biggestDelay=i'  => \$opt_biggestDelay,
  'smallestDelay=i' => \$opt_smallestDelay,
  'errDelay=i'      => \$opt_errDelay,
  'timeout=i'       => \$opt_timeout,
  'analog!'         => \$opt_analog,

  'logging!'        => \$opt_logging,

) or pod2usage(verbose => 1) && exit;


######################################################################
+#####
# BEGIN PROCESS OPTIONS, ARGUMENTS
######################################################################
+#####
#   order of following args/opts is important!
#   test well after any change or invite unexpected behavior!
#   and update the pod too!
#   you've been warned!
if ($opt_help)                           { pod2usage(verbose => 1); ex
+it; }
if ($opt_man)                            { pod2usage(verbose => 2); ex
+it; }
if ($opt_runs < -1)                      { pod2usage(verbose => 1); ex
+it; }
if ($opt_biggestDelay<$opt_smallestDelay){ pod2usage(verbose => 1); ex
+it; }
if ($opt_errDelay < 0)                   { pod2usage(verbose => 1); ex
+it; }
if ($opt_timeout < 1)                    { pod2usage(verbose => 1); ex
+it; }
if ($opt_quiet)                          { $opt_verbose = 0; }

my $out;
if ($opt_logging){
  unless ($haveIOTee == 1){
    print "\nSorry - logging requires CPAN module IO::Tee\n\n";
    exit;
  }
  if ($opt_verbose < 1){
    print "\nSorry - logging not allowed in quiet/silent mode\n\n";
    exit;
  }
  $out = new IO::Tee( \*STDOUT, new IO::File("> $logFile"),);
  new IO::Tee( \*STDERR, new IO::File("> $errFile"),);
  print $out
    "\n",
    "  Logging to $logFile\n",
    "  Catastrophic errors to $errFile\n",
    "\n",
    ;
  $out->flush if $opt_logging;
}else{ $out = \*STDOUT;}

if ($opt_analog){
  unless (-e $analogPidFile){
    print $out
      "\n",
      "You specified dial connection, but no PID file was found.\n",
      "Are you on a *nix host?\n",
      "Are you using an analog dial connection?\n",
      "Is your dialup connection established?\n",
      "\n",
      ;
    $out->flush if $opt_logging;
    exit;
  }
}

if($opt_verbose >= 3) {
  # singular/plural second(s) appropriately:
  my %plural = ( b => 's', s => 's', e => 's', t => 's',);
  $plural{b} = '' if $opt_biggestDelay  == 1;
  $plural{s} = '' if $opt_smallestDelay == 1;
  $plural{e} = '' if $opt_errDelay      == 1;
  $plural{t} = '' if $opt_timeout       == 1;

  my %pluralDef = ( b => 's', s => 's', e => 's', t => 's',);
  $pluralDef{b} = '' if $def_biggestDelay  == 1;
  $pluralDef{s} = '' if $def_smallestDelay == 1;
  $pluralDef{e} = '' if $def_errDelay      == 1;
  $pluralDef{t} = '' if $def_timeout       == 1;
  print $out <<EOF

$0

Current values
  number of runs      $opt_runs
  biggest delay       $opt_biggestDelay second$plural{b}
  smallest delay      $opt_smallestDelay second$plural{s}
  error delay         $opt_errDelay second$plural{e}
  timeout             $opt_timeout second$plural{t}

Defaults
  number of runs      $def_runs
  biggest delay       $def_biggestDelay second$pluralDef{b}
  smallest delay      $def_smallestDelay second$pluralDef{s}
  error delay         $def_errDelay second$pluralDef{e}
  timeout             $def_timeout second$pluralDef{t}

Versions
  LWP::UserAgent      $LWP::UserAgent::VERSION
  IO::Tee             $IO::Tee::VERSION
  POSIX               $POSIX::VERSION
  Pod::Usage          $Pod::Usage::VERSION
  Getopt::Long        $Getopt::Long::VERSION
  Perl                $]
  OS                  $^O
  $0    $nailup_VER

EOF
;
$out->flush if $opt_logging;
}

if ($opt_daemonize){ Daemonize(); }

######################################################################
+#####
# END PROCESS OPTIONS, ARGUMENTS
######################################################################
+#####


# first attempt logged regardless of success/failure
# success = 1, failure = 0
my $oldSuccess = 3;

# Call the Main Chunk o'Code:
# (run indefinately, or run specified number of times)
if ($opt_runs == -1) { randomFetch() while(1) }
if ($opt_runs  > -1) { randomFetch() for(1..$opt_runs) }
print $out "\n";


######################################################################
+#####
# Main Chunk o'Code:
#  Fetch random page then wait to doit again
#   yeah yeah, it's a wee bit looooong for a sub,
#   but is only reasonable way I could find to
#   support selecting betwixt while(...) and for(...)
######################################################################
+#####
sub randomFetch {
  # randomly choose page:
  for (my $i = @sitelist; -- $i;) {
    my $r = int rand (1 + $i);
    @sitelist [$i, $r] = @sitelist [$r, $i] unless $r == $i;
  }
  my $site = $sitelist[0];

  # try to fetch da page:
  my $hdrs = new HTTP::Headers(
    Accept     => 'text/HTML',
    user_agent => $browser,
  );
  my $url  = new URI($site);
  my $req  = new HTTP::Request('GET', $url, $hdrs);
  my $ua   = new LWP::UserAgent;
  # $ua->proxy('http', "http://host.dom.tld:port");
  $ua->timeout($opt_timeout);
  my $resp = $ua->request($req);

  # check da response:
  my $success = 1 if $resp->is_success;
  $success    = 0 if $resp->is_error;
  my $delay;
  ++$current_run;
  my $randomDelay = 
    rand($opt_biggestDelay-$opt_smallestDelay)+$opt_smallestDelay;

  # human-readable date+timestamp:
  my ($sec, $min, $hour, $mday, $mon, $year) = localtime(time);
  my $stamp = sprintf(
    "%04d-%02d-%02d %02d:%02d:%02d",
    $year+1900, $mon+1, $mday, $hour, $min, $sec
  );

  # and report da response:
  # tell a lot:
  if ($opt_verbose >= 2){
    if ($success == 1){
      print $out "  $stamp  $site  $success  ";
      if ($current_run < $opt_runs or $opt_runs == -1){
        print $out "Next loop in $randomDelay second";
        print $out 's' if $randomDelay != 1;
        print $out ".\n";
      }else{ print $out "\nRun complete.\n\n"; }
    }
    if ($success == 0){
      print $out "\nError  $stamp  $site\n", $resp->message, "\n";
      if ($current_run < $opt_runs or $opt_runs == -1){
        print $out "Next loop in $opt_errDelay second";
        print $out 's' if $opt_errDelay != 1;
        print $out ".\n\n";
      }else{ print $out "\nRun complete.\n\n"; }
    }
    $out->flush if $opt_logging;
  }
  # tell a little:
  if($opt_verbose == 1){
    if ($success != $oldSuccess){
      print $out "  $stamp  $site   $success\n";
      $out->flush if $opt_logging;
    }
  }
  # cone of silence:
  if($opt_verbose == 0){ }

  # how long can this go on?
  if ($current_run < $opt_runs or $opt_runs == -1){
    $oldSuccess = $success;
    sleep $opt_errDelay if $success == 0;
    sleep $randomDelay  if $success == 1;
  }
}

######################################################################
+#####
# Background and disconnect from controlling vty, so vty can be closed
+:
#  fork() backgrounds, but doesn't disconnect from controlling vty
#    hence stolen Proc::Daemon subs
#  Unix-family OSen only
#  $^O strings known to me:
#    aix      freebsd  netbsd   openbsd
#    cygwin   darwin   hpux     irix
#    linux    sco      solaris  sunos
#    dec_osf  macos    msdos    mswin32  os390
######################################################################
+#####
sub Daemonize {
  if ($^O =~ /(MSWin32|MSDOS|MacOS)/i){
    print $out
      "\nSorry, --daemon only works on *nix hosts.  ",
      "You appear to be running $^O.\n\n",
    ;
    $out->flush if $opt_logging;
    exit;
  }

  if ($havePOSIX != 1){
    print $out "\nSorry, --daemon requires the CPAN POSIX module.\n\n"
+;
    $out->flush if $opt_logging;
    exit;
  }
  
  if ($opt_verbose > 0){
    ### BUG ###
    ### next line prints output thrice
    ### print $out <<EOF
    print <<EOF

$0 launched and daemonizing.
Something like 'ps x|grep nailup.pl' for PID to kill program.

EOF
    ;
  }
  Init();
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Try to fork if at all possible.
#   Function will croak if unable to fork.
sub Fork {
  my($pid);
  FORK: {
    if (defined($pid = fork)) {

      return $pid;
    } elsif ($! =~ /No more process/) {
      sleep 5;
      redo FORK;
    } else { croak "Can't fork: $!"; }
  }
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Return the maximum number of possible file descriptors.
#   If sysconf() does not give us value, we punt with our own value.
sub OpenMax {
    my $openmax = POSIX::sysconf( &POSIX::_SC_OPEN_MAX );
    (!defined($openmax) || $openmax < 0) ? 64 : $openmax;
}

######################################################################
+#####
# Shamelessly copied from Proc::Daemon v0.02
#   Become a daemon.
sub Init {
  my $oldmode = shift || 0;
  my($pid, $sess_id, $i);

  ## Fork and exit parent
  if ($pid = Fork) { exit 0; }

  ## Detach ourselves from the terminal
  croak "Cannot detach from controlling terminal"
    unless $sess_id = POSIX::setsid();

  ## Prevent possibility of acquiring a controling terminal
  if (!$oldmode) {
    $SIG{'HUP'} = 'IGNORE';
    if ($pid = Fork) { exit 0; }
  }

  ### ybiC
  ### --logging
  # Change working directory
  # chdir "/";

  ## Clear file creation mask
  umask 0;

  ### ybiC
  ### --logging
  # Close open file descriptors
  # foreach $i (0 .. OpenMax) { POSIX::close($i); }

  ## Reopen stderr, stdout, stdin to /dev/null
  open(STDIN,  "+>/dev/null");
  open(STDOUT, "+>&STDIN");
  open(STDERR, "+>&STDIN");

  $oldmode ? $sess_id : 0;
}
### ybiC
### Name "main::init" used only once. Possible typo at line...
# *init = \&Init;

######################################################################
+#####


=head1 TITLE

nailup.pl

=head1 SYNOPSIS

 nailup.pl [options arguments]

=head1 DESCRIPTION

Randomly fetch a remote web page from a list of URLs.
Waits a random length of time between fetches.
Optionally log results to file, daemonize, and/or check for ppp0 PID f
+ile.

 Written with two purposes in mind:
 * log internet connectivity
 * nail up analog dial connection

=head1 ARGUMENTS

 -r=10   --runs=10           numbers of times to fetch a random page
                                                   (-1 means indefinat
+ely)
 -b=500  --biggestDelay=500  biggest wait between fetches        (seco
+nds)
 -s=60   --smallestDelay=60  smallest wait between fetches       (seco
+nds)
 -e=1    --errDelay=1        wait after failed fetch             (seco
+nds)
 -t=30   --timeout=30        wait for response this long         (seco
+nds)
 -v=0    --verbose=0  silent
                   1  $timestamp  $site  $success (on success/fail cha
+nge)
                   2  $timestamp  $site  $success  'next in N seconds'
                   3  same as 2 plus current+default params at head

                                                  (default verbosity i
+s 2)

 Longest delay must be greater than or equal to smallest delay.
 Any value entered must be a positive integer (except for --runs).
 All are optional, as default values exist.

=head1 OPTIONS

 -h   --help     brief help message          (overrides all options be
+low)
 -m   --man      full documentation          (overrides all options be
+low)
 -q   --quiet    silent; equivelent to -v=0        (overrides --verbos
+e=N)
 -d   --daemon   background+disconnect so can close vty w/o killing ru
+n
                           ('ps x|grep nailup.pl' or such for PID to k
+ill)
                                                        (UNIX-family o
+nly)
 -l   --logging  output to screen and logfile   (not allowed with --qu
+iet)
 -a   --analog   check for ppp0 PID file, abort if not present
                         (localhost using analog modem for net-connect
+ion)

Short options cannot be bundled.
For example, C<nailup.pl -adl> is *not* valid.
But you could do C<nailup.pl -a -d -l> for analog/daemon/log combinati
+on.

=head1 CONFIG PARAMS

The following parameters can be changed by editing in CONFIG PARAMS
near the top of the program code:

 $logFile
   filespec to write to if --logging
   default is ./nailup.log

 $errFile
   filespec to write catastropic errors to if --logging
   default is ./nailup.err

 $analogPidFile
   absolute path to ppp0 PID file for live dialup connection
   *must* be set if using --analog commandline option

 $browser
   What browser this program reports itself as to web servers
   Program will run fine with this unchanged

 @sitelist
   List of websites to fetch
   Can be specific page, or just site address
   Must include leading 'http://'
   Trailing slash optional
   Program will run fine with this unchanged

=head1 AUTHOR

ybiC

=head1 CREDITS

 Props to:
   tye and others for $fh tips,
   Fletch and tachyon for URI::URL tip,
   lemming for link to semi-complete $^O list,
     http://testers.cpan.org/search?request=by-config
   zeno for [id://57865] which inspired this code.
 Oh yeah, and to some guy named vroom

=head1 TESTED

 Perl 5.00503, Debian 2.2r6
 Full functionality
   Getopt::Long    2.01, 1.90
   Pod::Usage      1.14
   POSIX           1.02
   IO::Tee         0.64
   LWP::UserAgent  2.29, 2.262

 Perl 5.00601, Cygwin on Win2k
 Full functionality
   Getopt::Long    2.25
   Pod::Usage      1.14
   POSIX           1.03
   IO::Tee         0.64
   LWP::UserAgent  2.001

 Perl 5.00601, Win2k cmd.exe
 Daemonization not available
   Getopt::Long    2.25
   Pod::Usage      1.14
   POSIX           1.03
   IO::Tee         0.64
   LWP::UserAgent  1.77

=head1 BUGS

 --daemon initial message prints thrice if --logging and print $out
 Workaround fitted: print instead of print $out

 Daemonization works on Cygwin, but how to determine the PID ??

=head1 UPDATE

 2002-04-22   20:25 CDT
   Fix thai-pos
   Post to PerlMonks
   Test on Win32-ActivePerl and also Cygwin
   Enclose POSIX, IO::Tee in BEGIN{ eval... require... } blocks
     so can run on hosts without those modules
   --logging with --daemon
     copy+tweak code from Proc::Daemon v0.02
   Revise verbosity levels/results
   Option to specify analog dial connection
     (check for /var/run/ppp0.pid)

 2002-04-16   06:50 CDT
   Initial working code

=head1 TODO

 Squash above known bug(s) if any
 --sitelist arg to override default @sitelist
   trailing ' -- ' and what follows is passed to @ARGV
   sanity-check, untaint
 Enable bundling of short options eg; -lda
   Getopt::Long::Configure qw(bundling ignore_case_always);
   above syntax errs if long or short switches entered >8^(

=cut