Beefy Boxes and Bandwidth Generously Provided by pair Networks kudra
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Why doesn't my scraper work?

by talexb (Canon)
on Nov 19, 2013 at 22:09 UTC ( #1063400=note: print w/ replies, xml ) Need Help??


in reply to Why doesn't my scraper work?

My best guess is that there's some Javascript involved, whch makes things a lot more complicated when scraping is involved.

You should also keep in mind that scraping a site like http://www.cbssports.com might be against their Terms Of Use. If there's an API that you can use instead, all the better.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.


Comment on Re: Why doesn't my scraper work?
Re^2: Why doesn't my scraper work?
by jdlev (Scribe) on Nov 19, 2013 at 22:25 UTC
    Is there a way for HTML TableExtract to look up a table with the attribute "class = etc"? I've tried that before and it seems it doesn't like looking for a class name?
    I love it when a program comes together - jdhannibal
      For others wondering I figured it out by using WWW::Mechanize as opposed to LWP::Simple when fetching the original data. It's at least saving the full code from the page now by using www::mechanize.
      I love it when a program comes together - jdhannibal
      it seems it doesn't like looking for a class name
      In what way does it not like it? As long as you initialise the module with the attributes you want it should not have a problem:
      my $te = HTML::TableExtract->new( attribs=> { class=>'class-name' } ); $te->parse($html_string); for my $ts ($te->tables) { print "Table with class 'class-name' found\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1063400]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2014-04-19 18:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (483 votes), past polls