Scraping AJAX?

Dave_COS has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I'm stumped.

I'm trying to extract data from a field on a dynamic web page. For an example, please go to

Dojo Demo

In this example, I'd like to extract 'Alexander' from the second row.

How the heck do I do this?

I've tried:

WWW::Mechanize -- no joy, it doesn't get the dynamic data.
WWW::Selenium -- can't figure out how to extract data (or even if I can)
Selenium::Remote::Driver -- can't figure out how to extract data (or even if I can)
WWW::HtmlUnit -- can't figure out how to extract data (or even if I can)

ARRGGGH! Has anyone done this, or can anyone point me in the right direction? I will be eternally grateful...

Dave

Comment on Scraping AJAX?

Replies are listed 'Best First'.
Re: Scraping AJAX? by kelchris (Novice) on Jan 26, 2012 at 16:40 UTC
Use LWP::UserAgent or Mojo::UserAgent The datagrid is passed in json format from: http://dojotoolkit.org/documentation/tutorials/1.6/datagrid/demo/hof-batting.json Parse that JSON data using JSON::XS	[reply]
Re: Scraping AJAX? by Anonymous Monk on Jan 26, 2012 at 16:31 UTC
Ok, third time today ( Problems with WWW:Mechanize and a form, WWW::Mechanize "Input not found" ), the solution to every scraping problem is explained in Re^5: can't get WWW::Mechanize to sign in on JustAnswer or Web Testing with HTTP::Recorder or WWW::Mechanize::Firefox	[reply]
Re: Scraping AJAX? by Anonymous Monk on Jan 26, 2012 at 16:57 UTC
can't figure out how to extract data Learn XPath! `use WWW::Mechanize::Firefox qw(); use HTML::TreeBuilder::LibXML qw(); my $mech = WWW::Mechanize::Firefox->new; $mech->get('http://dojotoolkit.org/documentation/tutorials/1.6/datagri +d/demo/datagrid-simple.html'); my $tree = HTML::TreeBuilder::LibXML->new; $tree->parse($mech->content); $tree->eof; my $name = $tree->findvalue('/html/body/div/div[2]/div/div/div/div/div +[2]/table/tbody/tr/td[2]');` [download]	[reply] [d/l]
Re^2: Scraping AJAX? by Dave_COS (Initiate) on Jan 26, 2012 at 17:14 UTC
I do know XPath. More details: I'm trying to scrape a Blackberry Administration Service webpage, which is behind a firewall; otherwise I would have given that link. I'm trying to get the element //div\@id='dojox_grid__View_5'/div/div/div/div/table/tbody/tr/td20/span" Which Chrome "Inspect element" sees, and the Selenium IDE sets every time -- but when I try to run against WWW:Mechanize it shows NO content, and the Selenium packages state that element does not exist. Forgive me if the solution is obvious, but I have read the other posts -- and tried the code -- and have had no success.	[reply]
Re^3: Scraping AJAX? by kelchris (Novice) on Jan 26, 2012 at 17:40 UTC
Thats because you need to get() the JSON data which contains the actual table you want to scrape. The content you are getting is only the page that contains the JS functions and there should also be the ajax link in there you can use to get the actual data. Once you get that link, get() it then parse the data from there.	[reply]
Re^4: Scraping AJAX? by Dave_COS (Initiate) on Jan 26, 2012 at 19:03 UTC
Re^5: Scraping AJAX? by Corion (Patriarch) on Jan 26, 2012 at 22:31 UTC
Re: Scraping AJAX? by Anonymous Monk on Jan 26, 2012 at 16:46 UTC
Gtk2::WebKit::Mechanize Win32::IE::Mechanize WWW::Mechanize::Firefox WWW::Scripter Gtk3::WebKit	[reply]
Re: Scraping AJAX? by ajinkyagadewar (Novice) on Apr 16, 2012 at 07:25 UTC
When you hit the url for Dojo Demo the page you hit is the html page. Once the page is loaded a call goes to url 'http://dojotoolkit.org/documentation/tutorials/1.6/datagrid/demo/hof-batting.json' to pull the JSON data. `my $mech = WWW::Mechanize->new();<br> $mech->get('http://dojotoolkit.org/documentation/tutorials/1.6/datagri +d/demo/hof-batting.json');<br>` [download] To extract data from content there are alternatives: 1. Using simple regular expression you can pull the required data. 2. Convert JSON to array of hashes and pull the required data. Let me know if it helps.	[reply] [d/l]


Just another Perl shrine
	PerlMonks