<?xml version="1.0" encoding="windows-1252"?>
<node id="981470" title="Re^3: Timing web page download." created="2012-07-12 14:32:28" updated="2012-07-12 14:32:28">
<type id="11">
note</type>
<author id="354935">
Sinistral</author>
<data>
<field name="doctext">
&lt;p&gt;The most likely candidate NPM seems like it might be &lt;a href="http://search.npmjs.org/#/jscrape"&gt;jscrape&lt;/a&gt;, which combines &lt;a href="http://search.npmjs.org/#/jsdom"&gt;jsdom&lt;/a&gt;, &lt;a href="https://new.npmjs.org/package/request"&gt;request&lt;/a&gt;, and jquery.  The reason I recommended Javascript / Node as an option is your own language:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;This works more-or-less the way I intended, there are two problems though - since the list of links is dynamic, and partly created using javascript, I had to use the browser to create that list.&lt;/p&gt;

&lt;p&gt;I need a way of parsing web page, and getting a list of all its component, and this is my first problem.&lt;/blockquote&gt;
&lt;/p&gt;

&lt;p&gt;If you are dealing with pages that use Javascript to dynamically load resources, then you have to have something that can interpret that Javascript as a browser would.&lt;/p&gt;

&lt;p&gt;As something completely different, you might want to check out &lt;a href="http://seleniumhq.org/"&gt;Selenium&lt;/a&gt;.&lt;/p&gt;</field>
<field name="root_node">
981041</field>
<field name="parent_node">
981391</field>
</data>
</node>
