http://www.perlmonks.org?node_id=1028565


in reply to Re: how to access HTML within a javascript
in thread how to access HTML within a javascript

I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

#!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $doc_filename = "/home/user1/doc.txt"; open(DOC_FILE, ">$content_filename") || die "$!"; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->get("<your_URL_here>"); printf("title: %s\n", $mech->title()); printf(DOC_FILE "%s\n", $mech->document());

After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.