Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: How do I access a password protected site and access data?

by marto (Archbishop)
on Jun 29, 2005 at 08:35 UTC ( #470924=note: print w/replies, xml ) Need Help??


in reply to Re^2: How do I access a password protected site and access data?
in thread How do I access a password protected site and access data?

Hi,

Looking back at this question a day later raises a query in my mind. Does the page you are trying to access have some kind of login form where you are required to enter a valid username and password, or is a pop up window displayed asking for valid credentials?

Either way, in the past I have implemented screen scraping / data processing of sites using WWW::Mechanize. The previous link shows some examples of how easy it is to process forms (login or otherwise), follow links and return page content for processing.
I have done the same thing using other methods, but in my experience using WWW::Mechanize is easier to implement.

It is also worth while reading merlyn's column Web scraping with WWW::Mechanize (Apr 03) which I found very informative.

Hope this helps

Martin
  • Comment on Re^3: How do I access a password protected site and access data?

Replies are listed 'Best First'.
Re^4: How do I access a password protected site and access data?
by jaydon (Novice) on Jul 08, 2005 at 17:45 UTC

    OK, my install failed at the 'make test' run indicating 1 test failed. Tail of the output is:

    Failed Test Stat Wstat Total Fail Failed List of Failed -------------------------------------------------------------------------------
    t/link-relative.t 255 65280 6 8 133.33% 3-6 9 tests and 16 subtests skipped.
    Failed 1/49 test scripts, 97.96% okay. 4/577 subtests failed, 99.31% okay.

    *** Error code 11
    make: Fatal error: Command failed for target `test_dynamic'

    The failed test is called link-relative. Presuming that this indicated it is conducting tests to test relative links, which my code does not attempt to do, do you think I can safely ignore this failed test and resume installing the www::mechanize module?

      Hi,

      Have you tried ignoring the error and installing the module?
      It may be worth while having a read at the great tutorial A Guide to Installing Modules, written by tachyon.

      Hope this helps,

      Martin
        Hi,

        Yes I ignored it and installed, but a test using the script that you so kindly provided, resulted in a strange error:

        "No such field 'username' at /location/of/perl/modules/WWW/Mechanize.pm line 1169"

        So, ignoring the 'make run' error was probably not a good idea. Thanks for the link to the turorial. BTW, I got my script working using LWP, so I'm going to postpone investigating Mechanize for later.

        I had to specify the content-type as "application/x-www-form-urlencoded" in the header and send the username and password as content to get it to work.
        A friend of mine suggested that I do this as he said the server might be ignoring the login info when sent as form data

        Thanks for all your help! I have learnt a lot trying to write this script!

Re^4: How do I access a password protected site and access data?
by jaydon (Novice) on Jun 29, 2005 at 18:32 UTC

    Hello,

    Thank you for your continued interest in my problem.The initial URL takes you to a login page where you are prompted to enter the userID and password. It is not a pop-up window.

    Don't know if that makes a difference regarding using LWP. However this morning, I was trying to figure out what was going on by looking at the response headers and it looks like the session cookie that is sent back after the initial GET request is getting sent back in the header of the POST request. However, the server then sends back a second session cookie, and therein lies the problem, as this probably means that the server does not get the userID, password and session ID that is being sent to it with the POST.

    I have to temporarily stop working on this but will get back and try installing WWW::Mechanize and see. Will keep you posted.

      Hi,

      To get you started I have provided a little example:
      #!/usr/bin/perl use strict; use WWW::Mechanize; my $targeturl="http://www.yourdomain.com/login.asp"; my $mech = WWW::Mechanize->new(); $mech->agent_alias( 'Windows IE 6' ); $mech->get($targeturl); $mech->success or die $mech->response->status_line; $mech->form_number(1); # if the login form was the first form on the p +age $mech->set_fields( username => "MyUserID", password => "Fak3Pa55w0rd" ); $mech->submit(); print $mech->content(); # print content

      I setup an ASP page with a form on it to process the login. The above example logs in and prints out the content of the page following the login.

      Hope this helps

      Martin

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://470924]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (10)
As of 2018-10-18 14:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...














    Results (102 votes). Check out past polls.

    Notices?