Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

WWW::Mech and ASPX

by uni_j (Acolyte)
on May 28, 2009 at 19:56 UTC ( #766737=perlquestion: print w/replies, xml ) Need Help??

uni_j has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! This script is supposed to import the url's html into a var so I can use tokeparser to get all the options from it. Only problem is that the rendered html source I get from a browser (when I'm debugging, checking out the site) isn't the same as what I get from my perl script. What my script seems to be getting (if you see in the html file it outputs) is simple script tags that load the ASPX modules. I'm looking for a way to get the HTML I need to run the rest of my script from this url. Should I use another library (since WWW::Mechanize doesn't do javascript), or is there a clever solution that I'm forgetting ?
use HTML::TokeParser; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $stream = HTML::TokeParser->new(\$html); $url = "https://catalog.amherst.edu/amherst/frmCourseSearch.aspx?FormT +ableName=CourseSearch"; $mech->get($url); $html = $mech->content(); open HTML ,'>>html.html'; print HTML $html;

Replies are listed 'Best First'.
Re: WWW::Mech and ASPX
by ikegami (Pope) on May 28, 2009 at 20:17 UTC
    I think the document you want to visit has a referer check. You could solve this by starting your visit at the parent page and navigating to the page you want using ->follow_link or similar.
      I don't think follow_link will work in this situation. I had read the documentation and it doesn't seem to be the tool for this job. You have a code example of how you would use it in my situation ?

        I don't think follow_link will work in this situation

        Why not?
Re: WWW::Mech and ASPX
by perrin (Chancellor) on May 29, 2009 at 04:25 UTC
    Compare the headers sent by mech to the ones sent by your browser. Change the mech ones until they match.
      Live headers shows the same stuff, no URL I seem to be missing.
        All the headers are exactly the same? You must have done a lot of work on them already then. Are you sure? I'm not talking about the URL, I'm talking about the headers, like User-Agent, Cookie, etc.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://766737]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2020-01-25 03:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?