Re: how to access HTML within a javascript

by sundialsvc4 (Abbot)
on Mar 20, 2013 at 17:32 UTC

in reply to how to access HTML within a javascript

If this is the file that I remember, it’s a library for more-easily making modifications to the DOM = Domain Object Model ... which is the data-structure that is initially built by the browser during the course of parsing the HTML.

The key realization here is that the initial state of the DOM is only the initial state.   Most JavaScript programs work by altering the DOM.   They can create, remove, alter any of the nodes in the DOM-tree ... all sorts of wonderful and marvelous things ... and the browser’s display will follow suit.   The actual DOM structure that you see ... has no “source” to be viewed.   It is an output of a (JavaScript) computer program.

Re^2: how to access HTML within a javascript
on Apr 13, 2013 at 23:11 UTC

    I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

    #!/usr/bin/perl -w use strict; use WWW::Mechanize::Firefox; my $doc_filename = "/home/user1/doc.txt"; open(DOC_FILE, ">$content_filename") || die "$!"; my $mech = WWW::Mechanize::Firefox->new(activate => 1); $mech->get("<your_URL_here>"); printf("title: %s\n", $mech->title()); printf(DOC_FILE "%s\n", $mech->document());

    After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.

