|Perl: the Markov chain saw|
RFC: WWW::Mechanize Basics., Tutorial (In progress)by PerlSufi (Friar)
|on May 18, 2013 at 14:51 UTC||Need Help??|
Hello Monks, I wanted to write a basic how-to on using WWW::Mechanize that was suggested in Tutorial Quest. I will provide a basic over-view of how to log in to a website.
One of my first tasks at my job was to write a crawler that logged into a website and downloaded some account information. I will provide that portion here.
Some other tools will make working with Mechanize much easier. These would be Firebug (or some other web page inspector) and HTTP Live Headers. For this project, I really only needed Firebug. You will need this to inspect what the names and values of particular parts of the website you are trying to access. One can also set the agent_alias to several different things. In this example, I did not set it. But you can do so like: $mech->agent_alias($alias);.
You will notice here that I just made an if statement to verify if the event was successful. There is a $mech->success function which is very useful for knowing if it went through OK. It is good practice from what I have learned so far to give yourself some kind of verification that what you did worked. This can also be done by putting:
The mech->dump_* functions are very useful for debugging or finding out more things about the page you have accessed last. Use them frequently. There is a dump_forms, dump_text, dump_links, etc.. The next part I had to do was enter username/password, start/end date for the report I wanted to receive. I did it with the following:
Here I had to inspect the page with Firebug and find the name of each of the fields (in quotes in my script) and set their value to the variable I declared. The 'click' method did not need the button name specified, though you may have to do that some times. Yes, this site used SSL, and no, I did not need to do anything special to login to it this time. However, I have had to crawl another website using SSL, which I did need to do something special with. This is what I had to do:
In this method, I set it to not verify SSL. Actually, the start and end dates were acquired with a little bit more work using a different module, DateTime. I can get into that later.