Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Why doesn't my scraper work?

by talexb (Canon)
on Nov 19, 2013 at 22:09 UTC ( #1063400=note: print w/ replies, xml ) Need Help??


in reply to Why doesn't my scraper work?

My best guess is that there's some Javascript involved, whch makes things a lot more complicated when scraping is involved.

You should also keep in mind that scraping a site like http://www.cbssports.com might be against their Terms Of Use. If there's an API that you can use instead, all the better.

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.


Comment on Re: Why doesn't my scraper work?
Re^2: Why doesn't my scraper work?
by jdlev (Scribe) on Nov 19, 2013 at 22:25 UTC
    Is there a way for HTML TableExtract to look up a table with the attribute "class = etc"? I've tried that before and it seems it doesn't like looking for a class name?
    I love it when a program comes together - jdhannibal
      For others wondering I figured it out by using WWW::Mechanize as opposed to LWP::Simple when fetching the original data. It's at least saving the full code from the page now by using www::mechanize.
      I love it when a program comes together - jdhannibal
      it seems it doesn't like looking for a class name
      In what way does it not like it? As long as you initialise the module with the attributes you want it should not have a problem:
      my $te = HTML::TableExtract->new( attribs=> { class=>'class-name' } ); $te->parse($html_string); for my $ts ($te->tables) { print "Table with class 'class-name' found\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1063400]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2015-07-04 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls