http://www.perlmonks.org?node_id=385141


in reply to How to scrape an HTTPS website that has JavaScript

One approach I have used successfully is to let HTTP::Recorder generate WWW::Mechanize scripts for the actions you want to take, which means your Web browser handles the JavaScript for you. There's a Perl.com article on HTTP::Recorder.

Replies are listed 'Best First'.
Re^2: How to scrape an HTTPS website that has JavaScript
by Limbic~Region (Chancellor) on Aug 23, 2004 at 16:55 UTC
    tomhukins,
    You have succesfully gotten HTTP::Recorder to do JavaScript? You might want to let the author know, since the docs for even the latest development release state it won't record JavaScript Actions.

    Cheers - L~R

      It was a few months ago, but I recall successfully using HTTP::Recorder to use HTTP::Proxy with JavaScript enabled sites. HTTP::Proxy traps all HTTP requests, regardless of whether they are plain HTML hyperlinks or initiated by JavaScript. Granted, this only works when JavaScript does something simple, but in my experience it usually does. If the JavaScript does anything complex, then you're right: you'll have to either run a JavaScript interpreter within Perl or rewrite the algorithm in Perl.
      I did let the author know. She said that it kind of depends on what your JavaScript does. However, this is more the case if you are trying to use HTTP::Recorder to generate QA scripts that test your JavaScript. I don't think there is anything JavaScript could do that would matter when scraping a site which would not be captured by an HTTP proxy.