Re: how to access HTML within a javascript


good chemistry is complicated, and a little bit messy -LW
	PerlMonks

Re: how to access HTML within a javascript

by sundialsvc4 (Abbot)

on Mar 20, 2013 at 17:32 UTC ( [id://1024576]=note: print w/replies, xml )

Need Help??

in reply to how to access HTML within a javascript

If this is the file that I remember, it’s a library for more-easily making modifications to the DOM = Domain Object Model ... which is the data-structure that is initially built by the browser during the course of parsing the HTML.

The key realization here is that the initial state of the DOM is only the initial state. Most JavaScript programs work by altering the DOM. They can create, remove, alter any of the nodes in the DOM-tree ... all sorts of wonderful and marvelous things ... and the browser’s display will follow suit. The actual DOM structure that you see ... has no “source” to be viewed. It is an output of a (JavaScript) computer program.

Replies are listed 'Best First'.

Re^2: how to access HTML within a javascript
by Special_K (Monk) on Apr 13, 2013 at 23:11 UTC

I'm updating this page so anyone who reads it later will know the resolution. It turns out that WWW::Mechanize::Firefox appears to solve my problem. Here is the script I used:

#!/usr/bin/perl -w
use strict;
use WWW::Mechanize::Firefox;

my $doc_filename = "/home/user1/doc.txt";

open(DOC_FILE, ">$content_filename") || die "$!";

my $mech = WWW::Mechanize::Firefox->new(activate => 1);
$mech->get("<your_URL_here>");
printf("title: %s\n", $mech->title());


printf(DOC_FILE "%s\n", $mech->document());
[download]

After running the above script, the generated doc.txt file contains all html inserted by the javascript. I obviously can't guarantee this will work on every page, but it could at least be a starting point for anyone who finds this thread while searching for a way to scrape a page containing javascript.

[reply]
[d/l]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://1024576]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others musing on the Monastery: (3)

As of 2024-04-20 01:45 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found