Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: how to access HTML within a javascript

by Special_K (Beadle)
on Apr 13, 2013 at 23:11 UTC ( #1028565=note: print w/ replies, xml ) Need Help??


in reply to Re: how to access HTML within a javascript
in thread how to access HTML within a javascript

I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

#!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $doc_filename = "/home/user1/doc.txt"; open(DOC_FILE, ">$content_filename") || die "$!"; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->get("<your_URL_here>"); printf("title: %s\n", $mech->title()); printf(DOC_FILE "%s\n", $mech->document());

After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.


Comment on Re^2: how to access HTML within a javascript
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1028565]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (14)
As of 2014-12-19 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (88 votes), past polls