Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Project Help: Mechanize::Firefox - Scraping Websites with Javascript

by jdlev (Scribe)
on Sep 23, 2014 at 11:49 UTC ( [id://1101624]=perlquestion: print w/replies, xml ) Need Help??

jdlev has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys, I've been really struggling with this and could use some help. It's frustrating, because I can use google's dev tools, and using jQuery, go in and pick out the variables I need. That being said, I'm open to any suggestions on how to get this done - preferably using perl, but that's not entirely necessary. Here's where I'm at.

Based on suggestions I've rec'd, I decided to make a go with Mechanize::Firefox. So far, I've been able to get my program to open firefox and go to the correct page. The problems come when I need to execute some actions. Once it gets to the page, here's what I'd like it to do:

From the HTML source code, I've found that there is a javascript object variable that I'd like to parse. It appears to be setup as a hex with keys. Since it renders to html, I assume I should be able to parse it and pull the data I need? Here is the code from the site:

<script type="text/javascript"> Realtime.setPusherDelay(0); var myContests = [{"n":"Fantasy +League","a":20.0000,"id":334455"}, </script>

So basically, how would I go in there and pull the 'n:' variable or the 'id' variable? TIA :)

I love it when a program comes together - jdhannibal
  • Comment on Project Help: Mechanize::Firefox - Scraping Websites with Javascript
  • Download Code

Replies are listed 'Best First'.
Re: Project Help: Mechanize::Firefox - Scraping Websites with Javascript
by Corion (Patriarch) on Sep 23, 2014 at 11:57 UTC

    The code you've shown is not valid Javascript, so you can't get at it in a convenient way.

    If what you've shown actually is a mangled version of what otherwise would be valid Javascript, the easiest way is to pull over the value to Perl space:

    my $myContests= $mech->eval_in_page('myContests'); print $myContests->{n}; print $myContests->{id};
      I keep getting the error: Use of uninitialize value when I run this code. So I'm not sure if the variable "packagedContests" is invalid or if it has something to do with how the page is loading?

      use warnings; use WWW::Mechanize::Firefox; use DBI; use Data::Dumper; my $url = ("https://www.draftkings.com/contest-lobby"); my $mech = WWW::Mechanize::Firefox->new(); $mech->get( $url ) or die("Can't Get Web Page!"); $contest = $mech->eval_in_page('packagedContests'); print $contest->{n}; print $contest->{id};
      I love it when a program comes together - jdhannibal

        I'm sorry - I should read the documentation closer. The code to fetch the variable should be:

        my ($contest, $type) = $mech->eval_in_page( 'packageContests' );

        But you should also use strict; at the top of your program. This would have immediately shown the wrong use of ->eval_in_page, as you get a number back instead of a proper reference.

Re: Project Help: Mechanize::Firefox - Scraping Websites with Javascript
by LanX (Saint) on Sep 23, 2014 at 11:57 UTC
    Don't know what a "hex of keys" is, but this module allows to inject Javascript code into the page and execute it.

    So myContests could be evaluated and analyzed (if still in page scope!)

    Alternatively you could request the DOM node of this script-tag and parse the text with a Perl regex.

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1101624]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-24 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found