Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Establish a session to call URL with Perl

by rodms (Initiate)
on Jul 06, 2013 at 17:12 UTC ( [id://1042931]=perlquestion: print w/replies, xml ) Need Help??

rodms has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to mine data from a webpage with the WWW::Mechanize perl module. However, I first need to establish a connection so that this webpage will allow me to access the data. In a browser, I can establish this connection by clicking a particular href link. Is there a way to do this with Perl? Thank you very much.

Replies are listed 'Best First'.
Re: Establish a session to call URL with Perl
by sundialsvc4 (Abbot) on Jul 06, 2013 at 18:52 UTC

    You simply need to tell Mech to do whatever “clicking on that hyperlink” would do.   This can be easy, or it can be hard.   :-)

    It’s easy when the hyperlink is a simple href="somewhere" ... all that you need to do is to tell Mech to follow_link().   If it is a button or an image, you can click() it.   (Or, if you know what the link is and that it will never change, you can simply send Mech to its destination.   However, that approach creates a dependency between your script and the server’s ... if they change their code, your script will break.)

    Things get harder when the designer of the form chooses instead to use JavaScript, with a onclick="" handler.   WWW::Mechanize::FAQ has a specific section discussing the implications of JavaScript.   Most of the time, cursory examination of the JavaScript code will reveal what it actually sends back to the host, and you can do that directly.

    Successful login will undoubtedly work by sending you back a cookie, along with a redirect.   Mech can store the cookie and follow the redirect.   But, remember to correctly handle the case where the login is not correct, or some kind of server error is thrown, even if you are using “known good” credentials in your program.

Re: Establish a session to call URL with Perl
by Cody Fendant (Hermit) on Jul 07, 2013 at 03:18 UTC
    Do you mean "establish a connection" or do you mean "log in to the site"?

    Generally speaking, you go to a website with WWW::Mechanize just by using get('url') so if you know the URL you want to go to, say it's http://google.com you just do $mech->get('http://google.com').

    If you mean something else, please be more specific.
Re: Establish a session to call URL with Perl
by sundialsvc4 (Abbot) on Jul 07, 2013 at 15:09 UTC

    I expect that the OP does mean, “logging in.”   A generalized solution to this problem should consider that any site can “log you off” unexpectedly for any reason ... which is one of the reasons why Anonymous Monk has so many postings here.   You’ll need to determine how your target-site captures the logged-in status:   it could be a cookie, it could be a GET-parameter on the URL, or it could be both.   You need to learn how it tells you that you are “not logged in,” as well as “incorrect login” or “I’m just having a bad hair day today.”

    Probably the easiest way to do this .. anyway, the most common way I’ve seen it done .. is to write a class that specifies use base WWW::Mechanize;.   (WWW::Mechanize, in turn, is based on LWP::UserAgent, so you have quite a large number of methods to work with.   Write a public subroutine for sending a request to the site and interpreting, particularly, its error results.   This subroutine should either be aware of whether you are logged-in or not, or be able to transparently handle the business of getting logged-in before issuing the request.   It will be very specific to the particular site being talked-to, and it will insulate the rest of the code from any “niggling details.”

      ..which is one of the reasons why Anonymous Monk has so many postings here.

      Except that it isn't

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1042931]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-03-19 04:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found