Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

WWW::Mechanize v.s .JSP Web Control

by uni_j (Acolyte)
on Jun 03, 2009 at 17:06 UTC ( #768080=perlquestion: print w/replies, xml ) Need Help??

uni_j has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys, I'm scrapping this site and am looking for code examples to help me figure out how to retrieve the information inside of this JSP control (it's a lot harder then with regular text!). I don't seem to be finding anything with the HTTP headers, I've super searched and googled. I look forward to an updated webspidering book with modern AJAX/JSP control scenarios to learn from ;D Here's my code so far
use WWW::Mechanize; use HTML::TokeParser; my $mech = WWW::Mechanize->new(); my $stream = HTML::TokeParser->new(\$html); $url = "http://students.yale.edu/oci/search.jsp"; $mech->get($url); $html = $mech->content(); while(my $token = $stream->get_token()){ if($token->[1] eq "options"){ print $stream->get_text."\n"; } }

Replies are listed 'Best First'.
Re: WWW::Mechanize v.s .JSP Web Control
by whakka (Hermit) on Jun 03, 2009 at 18:21 UTC
    As mentioned, you simply need to figure out what the server is doing. Live HTTP Headers is a good way.

    The following just selects Accounting courses:

    my $ua = WWW::Mechanize->new; my $res = $ua->post( 'http://students.yale.edu/oci/resultWindow.jsp', Content => 'term=200901&GUPgroup=A&CourseNumber=&ProgramSubject=AC +CT&InstructorName=&timeRangeFrom=08&timeRangeTo=21&ExactWordPhrase=&y +cRules=new&distributionGroupOperator=AND&Submit.x=145&Submit.y=7' ); $ua->follow_link( tag => 'frame', name => 'resultFrame' ); $ua->follow_link( tag => 'frame', name => 'resultList' ); print $ua->content;
    Now you can parse the html to get at the information; it's the second table in the body.
      This is really helpful thanks :) I'm actually trying to get the data from the combobox that is in the jsp control (Program/subject), but the html is rendered afterwards (so my perl gets a big blank void :P). If posting all the header information verbatim will do the trick, this will be a trick I will do. Thanks whakka great reply !
Re: WWW::Mechanize v.s .JSP Web Control
by perrin (Chancellor) on Jun 03, 2009 at 17:51 UTC
    JSP, JavaScript, AJAX -- none of it matters at all. Browsers and servers can only speak via HTTP. Mech can send anything a browser can. Use a tool like Firebug to see the exact request and headers that your browser is sending, and then send the same thing with Mech. That's all you have to do.
Re: WWW::Mechanize v.s .JSP Web Control
by Anonymous Monk on Jun 03, 2009 at 17:35 UTC
    the information inside of this JSP control

    You aren't dealing with JSP; JSP is the back-end; you're always dealing with CGI.

    I don't seem to be finding anything with the HTTP headers

    What are you looking for? Maybe you should copy each and every header...

    I look forward to an updated webspidering book with modern AJAX/JSP control scenarios to learn from

    That would be a short book (or the same old book).

    Add this to your code

    #!/usr/bin/perl -- use strict; use warnings;
    and you'll get error because you're using $html before it populated. Also, there are no options in the html (that is created by javascript).
      Maybe you can give me an example ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://768080]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2020-01-26 23:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?