Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^2: how to access HTML within a javascript

by Special_K (Beadle)
on Apr 13, 2013 at 23:11 UTC ( #1028565=note: print w/ replies, xml ) Need Help??

in reply to Re: how to access HTML within a javascript
in thread how to access HTML within a javascript

I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

#!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $doc_filename = "/home/user1/doc.txt"; open(DOC_FILE, ">$content_filename") || die "$!"; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->get("<your_URL_here>"); printf("title: %s\n", $mech->title()); printf(DOC_FILE "%s\n", $mech->document());

After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.

Comment on Re^2: how to access HTML within a javascript
Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1028565]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (12)
As of 2015-11-30 17:08 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (777 votes), past polls