Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
This script will pull a list of the books you currently have checked out from the San Francisco Peninsula Library system. It will email you the list if any of the books are due within a user definable period of time.

The challenging part of this script is that the data we're interested in takes 5 clicks to get to. The original script I wrote scraped all five pages and was very fragile. I rewrote the code to get to the fifth page using WWW::Mechanize (written by petdance) and am very pleased with the result. The fetch_html() subroutine used to be several hundred lines long, now its about a dozen.

I discussed the Mechanize code in fetch_html at a recent Silicon Valley perl mongers lightning talk.

Thanks petdance, WWW::Mechanize was a big win here.

#!/usr/bin/perl use strict; use 5.006; use warnings; use WWW::Mechanize; use Time::Piece; use Mail::Mailer; use HTML::TableExtract; use Getopt::Long; use Pod::Usage; # Default auth info my $defaultuser = '12345678901234'; my $defaultpass = '1234'; my $defaultemail = 'you@youremail.com'; my $starturl = 'http://catalog.plsinfo.org/ipac20/ipac.jsp?profile +='; # Get command line options my %opt; GetOptions (\%opt, 'user=s', 'pass=s', 'outfile=s', 'email=s', 'daystowarn=i', 'maxtitlelen=i', 'send_email', 'print_output', 'save_output', 'help', ) or pod2usage(-verbose => 1) && exit; # Fill in some defaults $opt{daystowarn} ||= 3; $opt{user} ||= $defaultuser; $opt{pass} ||= $defaultpass; $opt{email} ||= $defaultemail; $opt{maxtitlelen} ||= 58; $opt{print_output} = 1 if !($opt{send_email} || $opt{save_output}); $opt{help} && pod2usage(-verbose => 1) && exit; # Generate the output text my $html = fetch_html($starturl); my $data = parse_data_from_html($html); my $sorted = sort_data_by_date($data); my $mindays = $sorted->[0][2]; my $output = build_output($sorted); # Do something with the output text save_output($output) if $opt{save_output}; print_output($output) if $opt{print_output}; email_output($output,$mindays) if $opt{send_email} and $mindays < $opt{daystowarn}; ######### SUBS ########## sub fetch_html { my $url = shift; my $agent = WWW::Mechanize->new(); $agent->quiet(1); $agent->get($url); error($agent,1) unless $agent->status eq 200; $agent->follow('Your Library Account'); $agent->form(1); $agent->field('sec1',$opt{user}); $agent->field('sec2',$opt{pass}); $agent->click('button'); error($agent,2) if $agent->content =~ /Login\s+failed/; $agent->follow('Checked Out'); $agent->follow('Due Date'); return $agent->content; } sub parse_data_from_html { my $html = shift; my $te = new HTML::TableExtract ( headers => ['Due Date'], subtables => 1, ); $te->parse($html); my (@dates,@titles); foreach my $ts ($te->table_states) { my $depth = ($ts->coords)[0]; if ($depth == 2) { my $title = ($ts->rows)[0]->[0]; $title =~ s|\s+/$||; $title = substr($title,0,$opt{maxtitlelen}) if length($title) > $opt{maxtitlelen}; push(@titles,$title); } else { @dates = map {$_->[0]} $ts->rows; } } my ($checkedout) = $html =~ /Checked\s+Out:\s*(\d+)/; $checkedout ||= 0; warn "Dates=" . @dates . "; Titles=" . @titles . "; Checkedout=$chec +kedout" if @titles != @dates or @titles != $checkedout; my @rv; push(@rv,[$titles[$_],$dates[$_]]) for 0..$#dates; return \@rv; } sub sort_data_by_date { my $data = shift; my @rv; my $today = Time::Piece->strptime(localtime()->ymd, "%Y-%m-%d"); for (@$data) { my ($title,$datestr) = @$_; my $date = Time::Piece->strptime($datestr, "%m/%d/%Y"); my $days = int(($date - $today)->days); push(@rv,[$title,$datestr,$days]); } @rv = sort {$a->[2] <=> $b->[2]} @rv; return \@rv; } sub build_output { my $sorted = shift; my $output = localtime() . "\n"; for (@$sorted) { my ($title,$datestr,$days) = @$_; $output .= sprintf "%2s days (%s) %s\n", $days, $datestr, $title; } return $output; } sub print_output { my $output = shift; print $output; } sub save_output { my $output = shift; warn "no output file specified" unless $opt{outfile}; if ($output && $opt{outfile}) { open(OUT,">$opt{outfile}") or die; print OUT $output; close(OUT); } } sub email_output { my $output = shift; my $mindays = shift; my $plural = abs($mindays) == 1 ? '' : 's'; my $subject; if ($mindays < 0) { $subject = "Library Books Are " . abs($mindays) . " day$plural LAT +E"; } elsif ($mindays == 0) { $subject = "Library Books Are Due Today"; } else { $subject = "Library Books Are Due in $mindays day$plural"; } my $mailer = Mail::Mailer->new; my %headers = (To => $opt{email}, Subject => $subject, ); $mailer->open(\%headers); print $mailer $output; $mailer->close; } sub error { my $agent = shift; my $err = shift; my $url = $agent->uri; if ($err == 1) { warn "ERROR (".$agent->status."): Can't get $url\n"; warn " Library website may be down\n"; } elsif ($err == 2) { warn "ERROR: Incorrect login information\n"; } exit; } =head1 NAME booksout.pl =head1 SYNOPSIS Print, save, or email a list of books checked out from the San Francisco Peninsula Library System. % booksout.pl --user=12345678901234 -pass=1234 =head1 DESCRIPTION Fetch and print a list of books checked out of the San Francisco Peninsula Library System =head1 ARGUMENTS --user=USER # 14 digit library barcode --pass=PASS # 4 digit library pin nuber (last part of phone nu +mber) --email=EMAIL # email to send booklist to --outfile=FILENAME # file to save booklist to --daystowarn=INT # warn this many days in advance --maxtitlelen=INT # maximum length of title to display in booklist =head1 OPTIONS --help # print this help info --send_email # email booklist if due in less than $daystowarn d +ays --save_output # save the booklist to $outfile --print_output # print booklist =head1 AUTHOR blakem =head1 CREDITS petdance wrote WWW::Mechanize, which made this script feasible =head1 BUGS None that I know of. =cut

-Blake


In reply to Get Booklist from San Francisco Peninsula Library System with WWW::Mechanize by blakem

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (9)
    As of 2015-07-03 15:37 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (53 votes), past polls