Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Not So Basic Password/Username grab of Web Page

by cdherold (Monk)
on Mar 01, 2003 at 18:53 UTC ( #239739=perlquestion: print w/replies, xml ) Need Help??

cdherold has asked for the wisdom of the Perl Monks concerning the following question:

I now know how to grab a webpage that's user/pass protected using

$ua = LWP::UserAgent->new; $req = HTTP::Request->new(GET => "$url"); $req->authorization_basic('user', 'pass'); $body = $ua->request($req)->as_string;
but for a webpage like the NYTimes it's not that straight forward (I am finding). Before accessing an article there you have to enter your user/pass. It's at this page

http://www.nytimes.com/auth/login?URI=http://www.nytimes.com/aponline/international/AP-Turkey-US-Iraq.html

Any ideas on how to get through this and automatically grab the following page?

thanks monks

cdherold

Replies are listed 'Best First'.
Re: Not So Basic Password/Username grab of Web Page
by Aristotle (Chancellor) on Mar 01, 2003 at 19:00 UTC
    Authorization apparently happens via a cookie. You need to set up a cookie jar with your LWP object, submit a form with the account details to the login page, and only then access the actual page. The LWP object will store the cookie sent back by the server along with the response into the jar, and will then send it along with further requests.

    Makeshifts last the longest.

Re: Not So Basic Password/Username grab of Web Page
by dws (Chancellor) on Mar 01, 2003 at 19:33 UTC
    Any ideas on how to get through this and automatically grab the following page?

    For screen-scraping that requires that you first post a form, and then issue a second request using the cookie you get from the first response, you can adapt the code snippet in Sending SMS msgs to AT&T phone / Coping with forms that want cookies. Replace the initial GET with a POST of your name and password.

Re: Not So Basic Password/Username grab of Web Page
by maksl (Pilgrim) on Mar 01, 2003 at 21:08 UTC
Re: Not So Basic Password/Username grab of Web Page
by pg (Canon) on Mar 01, 2003 at 21:05 UTC
    There are some more complicated cases there. If you are interested, try with yahoo email.

    The web page that yahoo email used to authenticate you, would send out your password encryted for security, to make it even complex, they send you a challenge, which is a string, and your password would be encrypted together with the challenge.

    By doing this, they are trying to make sure nobody can pirate your session, at least the solution to pirate your session is much less obvious.

    You can study yahoo's authentication page, and try to do the same encryption with Perl, which obviously has no problem with Perl ;-).

    It is MD5.

    You can make up specific programs, but it would be quite difficult to make up a generic program to deal with all web sites, without the ability to execute scripts in other languages.

    However, it is still fun to play with yahoo as your next step.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://239739]
Approved by Jenda
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2023-05-31 16:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?