Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Suggest a suitable module?

by jdetloff (Acolyte)
on Jan 17, 2010 at 05:48 UTC ( [id://817822]=perlquestion: print w/replies, xml ) Need Help??

jdetloff has asked for the wisdom of the Perl Monks concerning the following question:

So, I've tried a couple modules, and am not having luck completely completing my task with either.

I'm trying to make a script that navigates pages, submits data to both html and javascript forms, and returns the html of current page.

I tried WWW::Mechanize, but it doesn't support javascript. Then I tried Win32::IE::Mechanize, but discovered something strange -

When navigating the site normally clicking a link on the menu would open the requested page on another frame on the browser. Doing so with WWW::Mechanize brought gave you direct access to the document. When I use win32::IE::Mechanize to access the menu frame, and then follow the links, the page just refreshes to the main page. I tried typing url's straight into the browser and got the same results.

I have to assume this has something to do with the fact that Win32::IE:Mech is using a browser, instead of making a more direct request to the server? Do you have other ideas of reasons this could happen? Can you think of any ways to make the script access the documents I need?

If not, can any of you suggest a web automation tool that can follow links, parse javascripts, return a pages html or url, and fill javascripts and html forms without using a browser?

Thanks for any help!

***EDIT***

Oh, sorry, here's some code:

#!/usr/bin/perl -w use lib "C:/Program Files/Perl Express/Scripts"; use WWW::Mechanize; use LWP::Simple; package ClanBot; #Takes a username and a password, creates a Mechanize object #and uses the Mechanize object to log in to KOL. It returns the Mechan +ize #object #if login was successful, otherwise it diplays "Wait 60 seconds" and d +ies. #This sub works with either WWW::Mechanize or WWW::IE::Mechanize sub login { my(@login) = @_; my $mech = WWW::Mechanize->new(); $mech->get( "http://www5.kingdomofloathing.com" ); $mech->submit_form( form_number => 1, fields => { loginname => $login[0], password => $login[1], } ); $url = $mech->uri(); if($url =~ m/game.php/i) { return $mech; } else { print "Wait 60 seconds"; die; } return 0; } #Takes a reference to a mech object, calls back until it is at #game.php, then navigates links to the clan_detailedroster.php #This works with WWW::Mechanize, but whether I'm following links or us +ing #->get(url) with Win32::IE::Mechanize it always refreshes to game.php +#immediately sub toroster { $page = $_[0]; $url = $$page->uri(); while(!($$page->uri() =~ m/game.php/)) { $$page->back(); } $$page->follow_link(n=>1); $$page->follow_link(n=>7); $$page->follow_link(n=>7); $$page->follow_link(n=>3); return 1; }

Also here's an example of some javascript from the page I'd like to automate

<!-- function addlist() { which=0; if(document.getElementById("item11").innerHTML.length<1) which=11; if(document.getElementById("item10").innerHTML.length<1) which=10; if(document.getElementById("item9").innerHTML.length<1) which=9; if(document.getElementById("item8").innerHTML.length<1) which=8; if(document.getElementById("item7").innerHTML.length<1) which=7; if(document.getElementById("item6").innerHTML.length<1) which=6; if(document.getElementById("item5").innerHTML.length<1) which=5; if(document.getElementById("item4").innerHTML.length<1) which=4; if(document.getElementById("item3").innerHTML.length<1) which=3; if(document.getElementById("item2").innerHTML.length<1) which=2; if (which>0) { var text="<b>Send: <input type=text size=2 value=1 name=howman +y"+which+"> </b>"; text=text+"<select name=whichitem"+which+"><option value=0>-se +lect an item-</option><option value=71>wooden stakes (1)</option></se +lect>"; document.getElementById("item"+which).innerHTML=text; } else { alert('Eleven is enough.'); } }

It adds another drop down menu to a page that allows you to send items to people. Win32::IE::Mechanize could use this page, if it would navigate to it, which it wont. WWW::Mechanize couldnt use it, but it can navigate there.

Replies are listed 'Best First'.
Re: Suggest a suitable module?
by holli (Abbot) on Jan 17, 2010 at 12:44 UTC
    Have a look at Selenium. It's "by definition" a website testing tool, but can actually be used to automate all kind of magic with modern websites. And there are Perl bindings for it.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: Suggest a suitable module?
by Corion (Patriarch) on Jan 17, 2010 at 09:39 UTC

      Sorry, I kind if just assumed it wouldn't work because it too uses a browser. A quick test confirmed this.

        You must be doing something wrong then, because I use it to automate Javascript-heavy websites. Maybe if you showed the code you've been using to "test" this, I could show you your error. Most likely, you've been using repeated calls to ->get, which are akin to pasting an URL into the browser, instead of using ->click() on the appropriate DOM models. But then again, comparing what gets sent over the wire using one method or the other would even allow you to replicate the whole scenario using WWW::Mechanize, and it seems you haven't looked at what gets actually sent to determine the cause of the difference. Use Wireshark or the Mozilla Live HTTP Headers extension to inspect what gets sent to the server by the browser.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://817822]
Approved by Erez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-19 18:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found