Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Authentication with WWW::Mechanize

by cdherold (Monk)
on Sep 19, 2005 at 18:36 UTC ( #493235=perlquestion: print w/ replies, xml ) Need Help??
cdherold has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Now that I am feeling relatively comfortable about sending my user/pass data over the net to the https server that I am logging onto, I am at Stage 3 of my quest to screen scrape my brokerage account ... getting past the embattlements that are trying to keep my WWW::Mechanize agent out.

So once again, I come to the Monks for direction.

I will start with the current state of my code ...

use strict; use LWP::UserAgent; use WWW::Mechanize; use HTML::TokeParser; use HTTP::Cookies; use HTTP::Request; use Data::Dumper; my $user = 'MyUserName'; my $pass = 'MyPassword'; my $dv_data = ''; my $output = ''; # Set up cookie jar my $cookie = HTTP::Cookies->new(file => 'cookie',autosave => 1,); my $mech = WWW::Mechanize->new(cookie_jar => $cookie, autocheck => 1,) +; my $uri = URI->new( 'https://wwws.izone.com/apps/LogIn' ); $mech->get( $uri ); die $mech->response->status_line unless $mech->success; $output = $mech->content; for ($output =~ /name=\"DV_DATA\" type=\"hidden\" VALUE=\"(.*?)\">/smi +){ $dv_data = $1; } $mech->form_name( 'li' ); $mech->set_fields( USERID => $user, PASSWORD => $pass, DV_DATA => $dv_data ); $mech -> submit(); print $mech->content;
Please assume that I am not fully clear on most concepts.

In the code above, I pulled out a value called DV_DATA from the webpage when I first pull it down. This is a "hidden" input variable that appears to be time-based and assigned when the user enters the log-in page. I then include it as an input. I am not sure this is correct.

With the cookies, I am not sure what is going on, but I've been reading around and what I put in seems to be the general consensus for acceptable code to include a cookie jar in WWW::Mechanize. I know I need a cookie jar, but other than that I do not know what I should be doing with it.

When I run the code, it outputs these two Not Found statements ...

Not Found The requested URL /cgi-bin/apps/u/Home was not found on this server. Apache/2.0.50 (Fedora) Server at www.server.com Port 80 Not Found The requested URL /cgi-bin/apps/u/EquityTrade was not found on this se +rver. Apache/2.0.50 (Fedora) Server at www.server.com Port 8
Has anybody seen any of this before? Am I going in the right direction? What are the things that I am forgetting to consider?

Thanks Again Monks,

Chris Herold

Comment on Authentication with WWW::Mechanize
Select or Download Code
Re: Authentication with WWW::Mechanize
by petdance (Parson) on Sep 19, 2005 at 20:01 UTC
    From the FAQ:
    =head2 How do I get Mech to handle authentication? my $agent = WWW::Mechanize->new(); my @args = ( Authorization => "Basic " . MIME::Base64::encode( USER . ':' . PASS ) ); $agent->credentials( ADDRESS, REALM, USER, PASS ); $agent->get( URL, @args );

    xoxo,
    Andy

      That only works for sites using HTTP authentication.

      Unfortunately, the site the OP is trying to log into isn't one of them.

      Instead is uses forms-based authentication, for which use of form_name, set_field and submit (or some variation thereof) would be a more appropriate -- and less futile :-) -- way to proceed.

          --k.


      Update: s/OP/OP is trying to log into/;

      Please forgive my ignorance, but I may need a little hand holding here if you have the time.

      What are ADDRESS and REALM in credentials?

      Why do I need to have the user and pass in both the authorization of the @args as well as in the credentials?

      Will this work for logging into https even though it is 'basic'?

Re: Authentication with WWW::Mechanize
by cbrandtbuffalo (Deacon) on Sep 19, 2005 at 20:06 UTC
    Is there a reason you are creating a URI object to pass into the Mechanize get method? I think it just wants a simple URL. Try putting the https string directly in the get call like this:
    $mech->get('https://wwws.izone.com/apps/LogIn');
Re: Authentication with WWW::Mechanize
by Kanji (Parson) on Sep 19, 2005 at 20:15 UTC

    Your error messages are a little puzzling:-

    How are www.omniomix.com and wwws.izone.com related?

    Assuming that isn't the result of discrepency between code pasted here and the code run to get those errors, you may want to enable debugging by adding...

    use LWP::Debug qw( + );

    ...to your script and then re-running it.

    This should tell you exactly how submitting a form at izone.com is landing you at omniomix.com, and probably provide some vital insight to your problem.

        --k.


Re: Authentication with WWW::Mechanize
by CountZero (Bishop) on Sep 19, 2005 at 21:13 UTC
    You may have noticed that the log-in page has some JavaScript running:
    <SCRIPT LANGUAGE="JavaScript"> <!-- if (window!= top) top.location.href=location.href; var form; function init(){ form = document.li; if (form.USERID.value == "") form.USERID.focus(); } var scount = 0; function submitForm(){ scount++; if (scount==1) form.submit(); else return false; } // --> </SCRIPT>
    and that clicking the submit button actually runs this code rather than do the usual CGI-submit. I'm not into JavaScript, but perhaps it can explain WWW::Mechanize not working.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      All that the code in submitForm() is trying to do is prevent double-submits. It does the form.submit() only once, the first time called. (Returning false from an onsubmit handler says don't do a submit - they coulda structured this as just returning true or false, I think)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://493235]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2014-12-20 02:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls