Project Help: Mechanize::Firefox - Scraping Websites with Javascript

jdlev has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys, I've been really struggling with this and could use some help. It's frustrating, because I can use google's dev tools, and using jQuery, go in and pick out the variables I need. That being said, I'm open to any suggestions on how to get this done - preferably using perl, but that's not entirely necessary. Here's where I'm at.

Based on suggestions I've rec'd, I decided to make a go with Mechanize::Firefox. So far, I've been able to get my program to open firefox and go to the correct page. The problems come when I need to execute some actions. Once it gets to the page, here's what I'd like it to do:

From the HTML source code, I've found that there is a javascript object variable that I'd like to parse. It appears to be setup as a hex with keys. Since it renders to html, I assume I should be able to parse it and pull the data I need? Here is the code from the site:

      <script type="text/javascript">
          Realtime.setPusherDelay(0); var myContests = [{"n":"Fantasy 
+League","a":20.0000,"id":334455"},
</script>
[download]

So basically, how would I go in there and pull the 'n:' variable or the 'id' variable? TIA :)

I love it when a program comes together - jdhannibal

Comment on Project Help: Mechanize::Firefox - Scraping Websites with Javascript Download Code

Replies are listed 'Best First'.
Re: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by Corion (Patriarch) on Sep 23, 2014 at 11:57 UTC
The code you've shown is not valid Javascript, so you can't get at it in a convenient way. If what you've shown actually is a mangled version of what otherwise would be valid Javascript, the easiest way is to pull over the value to Perl space: `my $myContests= $mech->eval_in_page('myContests'); print $myContests->{n}; print $myContests->{id};` [download]	[reply] [d/l]
Re^2: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by jdlev (Scribe) on Sep 23, 2014 at 15:08 UTC
I keep getting the error: Use of uninitialize value when I run this code. So I'm not sure if the variable "packagedContests" is invalid or if it has something to do with how the page is loading? `use warnings; use WWW::Mechanize::Firefox; use DBI; use Data::Dumper; my $url = ("https://www.draftkings.com/contest-lobby"); my $mech = WWW::Mechanize::Firefox->new(); $mech->get( $url ) or die("Can't Get Web Page!"); $contest = $mech->eval_in_page('packagedContests'); print $contest->{n}; print $contest->{id};` [download] I love it when a program comes together - jdhannibal	[reply] [d/l]
Re^3: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by Corion (Patriarch) on Sep 23, 2014 at 15:28 UTC
I'm sorry - I should read the documentation closer. The code to fetch the variable should be: `my ($contest, $type) = $mech->eval_in_page( 'packageContests' );` [download] But you should also `use strict;` at the top of your program. This would have immediately shown the wrong use of `->eval_in_page`, as you get a number back instead of a proper reference.	[reply] [d/l] [select]
Re^4: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by jdlev (Scribe) on Sep 23, 2014 at 15:43 UTC
Re^5: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by Corion (Patriarch) on Sep 23, 2014 at 16:51 UTC
Some notes below your chosen depth have not been shown here
Re: Project Help: Mechanize::Firefox - Scraping Websites with Javascript by LanX (Saint) on Sep 23, 2014 at 11:57 UTC
Don't know what a "hex of keys" is, but this module allows to inject Javascript code into the page and execute it. So myContests could be evaluated and analyzed (if still in page scope!) Alternatively you could request the DOM node of this script-tag and parse the text with a Perl regex. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :)}	[reply]


Welcome to the Monastery
	PerlMonks