Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: How to scrape an HTTPS website that has JavaScript

by tomhukins (Curate)
on Aug 23, 2004 at 16:31 UTC ( #385141=note: print w/replies, xml ) Need Help??


in reply to How to scrape an HTTPS website that has JavaScript

One approach I have used successfully is to let HTTP::Recorder generate WWW::Mechanize scripts for the actions you want to take, which means your Web browser handles the JavaScript for you. There's a Perl.com article on HTTP::Recorder.
  • Comment on Re: How to scrape an HTTPS website that has JavaScript

Replies are listed 'Best First'.
Re^2: How to scrape an HTTPS website that has JavaScript
by Limbic~Region (Chancellor) on Aug 23, 2004 at 16:55 UTC
    tomhukins,
    You have succesfully gotten HTTP::Recorder to do JavaScript? You might want to let the author know, since the docs for even the latest development release state it won't record JavaScript Actions.

    Cheers - L~R

      It was a few months ago, but I recall successfully using HTTP::Recorder to use HTTP::Proxy with JavaScript enabled sites. HTTP::Proxy traps all HTTP requests, regardless of whether they are plain HTML hyperlinks or initiated by JavaScript. Granted, this only works when JavaScript does something simple, but in my experience it usually does. If the JavaScript does anything complex, then you're right: you'll have to either run a JavaScript interpreter within Perl or rewrite the algorithm in Perl.
      I did let the author know. She said that it kind of depends on what your JavaScript does. However, this is more the case if you are trying to use HTTP::Recorder to generate QA scripts that test your JavaScript. I don't think there is anything JavaScript could do that would matter when scraping a site which would not be captured by an HTTP proxy.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://385141]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2021-01-22 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?