Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Hello Monks, I wanted to write a basic how-to on using WWW::Mechanize that was suggested in Tutorial Quest. I will provide a basic over-view of how to log in to a website. One DON'T that I will say right off the bat to save future frustration is that WWW::Mechanize DOES NOT SUPPORT JAVASCRIPT. One of my first tasks at my job was to write a crawler that logged into a website and downloaded some account information. I will provide that portion here. Some other tools will make working with Mechanize much easier. These would be Firebug (or some other web page inspector) and HTTP Live Headers. For this project, I really only needed Firebug. You will need this to inspect what the names and values of particular parts of the website you are trying to access. One can also set the agent_alias to several different things. In this example, I did not set it. But you can do so like: $mech->agent_alias($alias);.
use WWW::Mechanize; my $mech = WWW::Mechanize->new(); my $url = ""; $mech->get($url); $mech->follow_link( url => ''); if ($mech->success()){ print "Successful Connection\n"; } else { print "Not a successful connection\n"; }
You will notice here that I just made an if statement to verify if the event was successful. There is a $mech->success function which is very useful for knowing if it went through OK. It is good practice from what I have learned so far to give yourself some kind of verification that what you did worked. This can also be done by putting:
print $mech->content;
The mech->dump_* functions are very useful for debugging or finding out more things about the page you have accessed last. Use them frequently. There is a dump_forms, dump_text, dump_links, etc.. The next part I had to do was enter username/password, start/end date for the report I wanted to receive. I did it with the following:
#This block of code is intended to fill in the required forms $mech->g +et(""); my $usr = "username"; my $pw = "password"; $mech->form_number(1); $mech->field( "capsn", $usr); $mech->form_number(2); $mech->field("capsp", $pw); $mech->form_number(3); $mech->field( "startdate", $start_date); $mech->form_number(4); $mech->field( "enddate", $end_date); $mech->click();
Here I had to inspect the page with Firebug and find the name of each of the fields (in quotes in my script) and set their value to the variable I declared. The 'click' method did not need the button name specified, though you may have to do that some times. Yes, this site used SSL, and no, I did not need to do anything special to login to it this time. However, I have had to crawl another website using SSL, which I did need to do something special with. This is what I had to do:
use WWW::Mechanize; use IO::Socket::SSL qw(); my $mech = WWW::Mechanize->new(ssl_opts => { SSL_verify_mode => IO::So +cket::SSL::SSL_VERIFY_NONE, verify_hostname => 0,});
In this method, I set it to not verify SSL. Actually, the start and end dates were acquired with a little bit more work using a different module, DateTime. I can get into that later. Newbies to this module should keep in mind that Mechanize DOES NOT interpret javascript. The only way around this that I have found so far is to use HTTP Live Headers to inspect what the HTTP is doing as you navigate through the site. Where there is GET, use $mech->get($url) Where there is a POST, use $mech->post('$url') I have successfully navigated a javascript heavy web page using this method, but it is extremely tedious. If you have a CHOICE, use WWW::Mechanize::Firefox, WWW::Selenium, or some other module that interprets javascript.

In reply to WWW::Mechanize Basics by PerlSufi

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (7)
    As of 2018-07-19 10:25 GMT
    Find Nodes?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

      Results (406 votes). Check out past polls.