Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Hello Monks, I wanted to write a basic how-to on using WWW::Mechanize that was suggested in Tutorial Quest. I will provide a basic over-view of how to log in to a website. One DON'T that I will say right off the bat to save future frustration is that WWW::Mechanize DOES NOT SUPPORT JAVASCRIPT. One of my first tasks at my job was to write a crawler that logged into a website and downloaded some account information. I will provide that portion here. Some other tools will make working with Mechanize much easier. These would be Firebug (or some other web page inspector) and HTTP Live Headers. For this project, I really only needed Firebug. You will need this to inspect what the names and values of particular parts of the website you are trying to access. One can also set the agent_alias to several different things. In this example, I did not set it. But you can do so like: $mech->agent_alias($alias);.
use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = "https://homepage.com"; $mech->get($url); $mech->follow_link( url => 'https://account.login.page.com'); if ($mech->success()){ print "Successful Connection\n"; } else { print "Not a successful connection\n"; }
You will notice here that I just made an if statement to verify if the event was successful. There is a $mech->success function which is very useful for knowing if it went through OK. It is good practice from what I have learned so far to give yourself some kind of verification that what you did worked. This can also be done by putting:
print $mech->content;
or
$mech->dump_text;
The mech->dump_* functions are very useful for debugging or finding out more things about the page you have accessed last. Use them frequently. There is a dump_forms, dump_text, dump_links, etc.. The next part I had to do was enter username/password, start/end date for the report I wanted to receive. I did it with the following:
#This block of code is intended to fill in the required forms $mech->g +et("https://account.login.page.asp"); my $usr = "username"; my $pw = "password"; $mech->form_number(1); $mech->field( "capsn", $usr); $mech->form_number(2); $mech->field("capsp", $pw); $mech->form_number(3); $mech->field( "startdate", $start_date); $mech->form_number(4); $mech->field( "enddate", $end_date); $mech->click();
Here I had to inspect the page with Firebug and find the name of each of the fields (in quotes in my script) and set their value to the variable I declared. The 'click' method did not need the button name specified, though you may have to do that some times. Yes, this site used SSL, and no, I did not need to do anything special to login to it this time. However, I have had to crawl another website using SSL, which I did need to do something special with. This is what I had to do:
use WWW::Mechanize; use IO::Socket::SSL qw(); my $mech = WWW::Mechanize->new(ssl_opts => { SSL_verify_mode => IO::So +cket::SSL::SSL_VERIFY_NONE, verify_hostname => 0,});
In this method, I set it to not verify SSL. Actually, the start and end dates were acquired with a little bit more work using a different module, DateTime. I can get into that later. Newbies to this module should keep in mind that Mechanize DOES NOT interpret javascript. The only way around this that I have found so far is to use HTTP Live Headers to inspect what the HTTP is doing as you navigate through the site. Where there is GET, use $mech->get($url) Where there is a POST, use $mech->post('$url') I have successfully navigated a javascript heavy web page using this method, but it is extremely tedious. If you have a CHOICE, use WWW::Mechanize::Firefox, WWW::Selenium, or some other module that interprets javascript.

In reply to WWW::Mechanize Basics by PerlSufi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (9)
    As of 2014-07-30 10:14 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (230 votes), past polls