Hello Wise Perl Monks:
Here I am again asking for your kind assistance.
For a home "non-commerical" project I am attempting to scrape data from here:
http://www.pro-football-reference.com/boxscores/
My code is below.
Of the html returned from the website I want to parse this table
<table class="sortable stats_table float_left margin_right" id="game_
+info">
<tr class='thead'><th colspan=2>Game Info</th></tr><tr class="">
<td align="" ><b>Stadium</b></td>
<td align="" >Hubert H. Humphrey Metrodome (dome)</td>
</tr>
<tr class="">
<td align="" ><b>Start Time</b></td>
<td align="" >12:00pm</td>
</tr>
<tr class="">
<td align="" ><b>Surface</b></td>
<td align="" >astroturf</td>
</tr>
<tr class="">
<td align="" ><b>Weather</b></td>
<td align="" >72 degrees, no wind</td>
</tr>
<tr class="">
<td align="" ><b>Vegas Line</b></td>
<td align="" >San Francisco 49ers <a href='/play-index/tgl_finder.c
+gi?request=1&match=season&year_min=1985&year_max=1985&game_type=R&gam
+e_num_min=0&game_num_max=99&week_num_min=0&week_num_max=99&game_day_o
+f_week=&game_time=&time_zone=&game_location=&game_result=&overtime=&l
+eague_id=&team_id=&opp_id=&conference_game=&division_game=&tm_is_play
+off=&opp_is_playoff=&tm_is_winning=&opp_is_winning=&tm_scored_first=&
+tm_led=&tm_trailed=&c1stat=favored_by&c1comp=eq&c1val=11'>-11.0</a></
+td>
</tr>
<tr class="">
<td align="" ><b>Over/Under</b></td>
<td align="" >46.0 <b>(over)</b></td>
</tr>
</table>
When I do I get this error
Can't call method "content" on an undefined value at C:/Perl64/site/li
+b/HTML/FormatText/WithLinks/AndTables.pm line 217.
at C:/Perl64/site/lib/HTML/FormatText/WithLinks/AndTables.pm line 217
HTML::FormatText::WithLinks::AndTables::_format_tables('HTML::Form
+atText::WithLinks::AndTables=HASH(0x4325450)', 'HTML::TreeBuilder=HAS
+H(0x4326a80)') called at C:/Perl64/site/lib/HTML/FormatText/WithLinks
+/AndTables.pm line 101
HTML::FormatText::WithLinks::AndTables::parse('HTML::FormatText::W
+ithLinks::AndTables=HASH(0x4325450)', '<table class="sortable stats_
+table float_left margin_right" ...') called at C:/Perl64/site/lib/HTM
+L/FormatText/WithLinks/AndTables.pm line 83
HTML::FormatText::WithLinks::AndTables::convert('HTML::FormatText:
+:WithLinks::AndTables', '<table class="sortable stats_table float_le
+ft margin_right" ...') called at C:/Users/kbd0718/workspace/testPerl/
+testGetProFootballBox.pl line 82
I have gotten HTML::FormatText::WithLinks to work for a couple of other tables within websites. But in this case it fails. The Perl code in HTML::FormatText::WithLinks is beyond me. I can not debug through it. I am hoping that one of you wise monks would that a crack at it. And either tell me what I am doing wrong or suggest a bug fix.
Many thanks for your kind assistance.
KD
use strict;
use warnings;
use Data::Dumper;
use HTML::FormatText::WithLinks::AndTables;
use IO::File;
use LWP::Simple;
my %teamCodes;
$teamCodes{"ATL"} = "atl"; ## Atlanta Falcons
$teamCodes{"CHI"} = "chi"; ## Chicago Bears
$teamCodes{"CIN"} = "cin"; ## Cincinnati Bengals
$teamCodes{"CLE"} = "cle"; ## Cleveland Browns
$teamCodes{"BUF"} = "buf"; ## Buffalo Bills
$teamCodes{"DAL"} = "dal"; ## Dallas Cowboys
$teamCodes{"DEN"} = "den"; ## Denver Broncos
$teamCodes{"DET"} = "det"; ## Detroit Lions
$teamCodes{"GNB"} = "gnb"; ## Green Bay Packers
$teamCodes{"HOO"} = "hoo|oti"; ## Houston Oilers
$teamCodes{"IND"} = "ind|clt"; ## Indianapolis Colts
$teamCodes{"NYJ"} = "nyj"; ## New York Jets
$teamCodes{"KAN"} = "kan"; ## Kansas City Chiefs
$teamCodes{"LAM"} = "lam|ram"; ## Los Angeles Rams
$teamCodes{"LAD"} = "lad|rai"; ## Los Angeles Raiders
$teamCodes{"MIA"} = "mia"; ## Miami Dolphins
$teamCodes{"MIN"} = "min" ; ## Minnesota Vikings
$teamCodes{"NYG"} = "nyg" ; ## New York Giants
$teamCodes{"NWE"} = "nwe" ; ## New England Patriots
$teamCodes{"NOR"} = "nor"; ## New Orleans Saints
$teamCodes{"PHI"} = "phi"; ## Philadelphia Eagles
$teamCodes{"PIT"} = "pit"; ## Pittsburgh Steelers
$teamCodes{"SEA"} = "sea"; ## Seattle Seahawks
$teamCodes{"SDG"} = "sdg"; ## San Diego Chargers
$teamCodes{"SFO"} = "sfo"; ## San Francisco 49ers
$teamCodes{"SLC"} = "slc|crd"; ## St. Louis Cardinals
$teamCodes{"TAM"} = "tam"; ## Tampa Bay Buccaneers
$teamCodes{"WAS"} = "was"; ## Washington Redskins
my $date1 = "198509080";
my $date2 = "198509090";
my $tKey;
my $link ;
my $abbriv;
my $urlBase = "http://www.pro-football-reference.com/boxscores/";
my $webPageText ;
my @teamCode ;
my $delimiter = quotemeta("|" );
my $startGameInfo ;
my $startGameInfoTbl;
my $endGameInfoTbl;
my $gameInfoTbl ;
while ( ($tKey, $abbriv) = each %teamCodes) {
@teamCode = split( /$delimiter/, $abbriv ) ;
print "$teamCode[0] \n";
}
while ( ($tKey, $abbriv) = each %teamCodes) {
@teamCode = split( /$delimiter/, $abbriv ) ;
$link = $urlBase . $date1. $teamCode[0] . ".htm" ;
print $link;
$webPageText = get( $link ) or print "failed on retrieve of
+ web page\n";
if (index( $webPageText, "File Not Found") > 0 ) {
print " failed on retrieve of web page\n";
}
else {
print "\n$webPageText\n\n";
if ( $startGameInfo = index( $webPageText, "Game Info")
+ ) {
$startGameInfoTbl = rindex($webPageText, "<table class
+=", $startGameInfo );
$endGameInfoTbl = index ( $webPageText, "</table>",
+ $startGameInfo );
$gameInfoTbl = substr($webPageText, $startGameIn
+foTbl, $endGameInfoTbl - $startGameInfoTbl +9);
print $gameInfoTbl;
my $converted = HTML::FormatText::WithLinks::AndTable
+s->convert( $gameInfoTbl );
my @lines = split /\n+/, $converted;
my $arraySize = @lines;
print "\narray size = $arraySize\n";
}
}
}