Why doesn't my scraper work?

jdlev has asked for the wisdom of the Perl Monks concerning the following question:

I'm just trying to get the program to find a table and return something if it does. Even though there are tons of tables on the page, it's still returning nothing when it's run? Any tips on where I might have screwed up my code?

my $html_file = get("http://www.cbssports.com/nfl/injuries/pup");
die "Couldn't Get HTML File!" unless defined $html_file;
#print $html_file;

for($depth = 0; $depth < 100; $depth++)
   {
      for($count = 0; $count < 100; $count++)
         {
   

               my $te = HTML::TableExtract->new( depth => $depth, coun
+t => $count ) or die(print "Unable To Extract Table");
               $te->parse($html_file) or die(print "Unable to parse st
+ring");

                foreach $ts ($te->tables) 
                {
                  print "Table found at ";
                  foreach $row ($ts->rows) 
                  {
                     print @$row;
                    }
                }
            #print "Depth = " . $depth . " Count = " . $count . "\n";
         }
   }

 #print "Injured Players Have Been Deleted From Database \n \n";
[download]

I love it when a program comes together - jdhannibal

Comment on Why doesn't my scraper work? Download Code

Replies are listed 'Best First'.
Re: Why doesn't my scraper work? by Old_Gray_Bear (Bishop) on Nov 20, 2013 at 00:01 UTC
Take a look at the CBS Sports API. It is better to use the authorized tools to get your data than try to ~~subvert the TOS~~ scrape the site. Nota Bene: CBS provides a lot of Developer tools to develop your own Apps for the Fantasy Leagues. You might want to start with the "Create Applications" tab and go from there. Update -- I did a little wandering through CBS Sports site and found the Terms of Service document. The second and fifth bullet points address web-scrapping. It boils down to "Don't Do It". ---- I Go Back to Sleep, Now. OGB	[reply]
Re: Why doesn't my scraper work? by talexb (Chancellor) on Nov 19, 2013 at 22:09 UTC
My best guess is that there's some Javascript involved, whch makes things a lot more complicated when scraping is involved. You should also keep in mind that scraping a site like http://www.cbssports.com might be against their Terms Of Use. If there's an API that you can use instead, all the better. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply]
Re^2: Why doesn't my scraper work? by jdlev (Scribe) on Nov 19, 2013 at 22:25 UTC
Is there a way for HTML TableExtract to look up a table with the attribute "class = etc"? I've tried that before and it seems it doesn't like looking for a class name? I love it when a program comes together - jdhannibal	[reply]
Re^3: Why doesn't my scraper work? by tangent (Parson) on Nov 19, 2013 at 23:00 UTC
it seems it doesn't like looking for a class name In what way does it not like it? As long as you initialise the module with the attributes you want it should not have a problem: `my $te = HTML::TableExtract->new( attribs=> { class=>'class-name' } ); $te->parse($html_string); for my $ts ($te->tables) { print "Table with class 'class-name' found\n"; }` [download]	[reply] [d/l]
Re^3: Why doesn't my scraper work? by jdlev (Scribe) on Nov 19, 2013 at 22:58 UTC
For others wondering I figured it out by using WWW::Mechanize as opposed to LWP::Simple when fetching the original data. It's at least saving the full code from the page now by using www::mechanize. I love it when a program comes together - jdhannibal	[reply]


Syntactic Confectionery Delight
	PerlMonks