Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Mystery of the disappearing table data?!

by jdlev (Scribe)
on Aug 22, 2013 at 23:04 UTC ( #1050565=perlquestion: print w/ replies, xml ) Need Help??
jdlev has asked for the wisdom of the Perl Monks concerning the following question:

I've been wasting most of today futzing with HTML::TableExtract in an effort to get some stats from my fantasy football site.

At first I thought it was something I had coded wrong, because everytime I ran Table Extract, it only returned the headers. But something funky is going on with the actual website.

I use google chrome, and when I click "inspect element" and the web dev tools pop up at the bottom, I can see all the statistical information, and the table with all the players and their stats is neatly filled out. Run the data scraper, and it doesn't get anything but the headers. So then, open up the page in 'view source' and they have the format for the data rows in the table, and absolutely no records!

Here's the web page, though I doubt you'll be able to see anything b/c you probably have to login: http://www.draftstreet.com/nfl/salary-cap.aspx?game_id=1793187&game_reg_id=5629100

Why on earth would the data just disappear when I run view source, and not show up at all when I run my data scraper. More importantly, how am I supposed to get the freakin stats if they're invisible to my scraping program?!

I love it when a program comes together - jdhannibal

Comment on Mystery of the disappearing table data?!
Re: Mystery of the disappearing table data?!
by runrig (Abbot) on Aug 22, 2013 at 23:47 UTC

    Maybe the table data does not exist upon initial page load, and is filled in by Javascript/Ajax calls. You may have to inspect the actual requests with Firefox/Firebug or similar.

    Or use a library that can execute the Javascript, like WWW::Mechanize::Firefox or WWW::Selenium.

Re: Mystery of the disappearing table data?!
by McA (Curate) on Aug 23, 2013 at 04:55 UTC

    Maybe the site reacts on the agent header of your client, declaring your scraper as a bot not allowing to gather the informations. Try to set a agent header of "real" browser.

    McA

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1050565]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-08-23 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (172 votes), past polls