Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: how to access HTML within a javascript

by sundialsvc4 (Abbot)
on Mar 20, 2013 at 17:32 UTC ( [id://1024576]=note: print w/replies, xml ) Need Help??


in reply to how to access HTML within a javascript

If this is the file that I remember, it’s a library for more-easily making modifications to the DOM = Domain Object Model ... which is the data-structure that is initially built by the browser during the course of parsing the HTML.

The key realization here is that the initial state of the DOM is only the initial state.   Most JavaScript programs work by altering the DOM.   They can create, remove, alter any of the nodes in the DOM-tree ... all sorts of wonderful and marvelous things ... and the browser’s display will follow suit.   The actual DOM structure that you see ... has no “source” to be viewed.   It is an output of a (JavaScript) computer program.

Replies are listed 'Best First'.
Re^2: how to access HTML within a javascript
by Special_K (Monk) on Apr 13, 2013 at 23:11 UTC

    I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

    #!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $doc_filename = "/home/user1/doc.txt"; open(DOC_FILE, ">$content_filename") || die "$!"; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->get("<your_URL_here>"); printf("title: %s\n", $mech->title()); printf(DOC_FILE "%s\n", $mech->document());

    After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1024576]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-20 01:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found