perlmeditation
rkg
<h3> Preface </h3>
Of course, the best way for one machine to talk to another machine over the web is through some machine-sensible protocol: XML, soap, whatever. That being said, there are times when this option isn't available, forcing you to use http(s) to write an app that mimics a browser. <p> This meditation describes my recent experiments writing such an app.
<em>Your Mileage May Vary. </em> <p>And of course do make sure such automated apps conform with any Terms of Use of the site you're using.<p>
<h3> LWP and WWW:Mechanize vs. OLE </h3>
There are many posts around web from folks asking "How doe I use perl to mimic a browser", and folks always answer, "use LWP" or "use WWW::Mechanize". Those are astoundingly great modules for many circumstances, but they also have limitations.
<p>
<readmore>
I'd suggest the strengths of LWP and WWW::Mechanize are:
<ul>
<li> No Details Are Hidden: you can work with the request in all of its glory at various levels of detail
<li> RTFM: decent (not great) documentation
<li> Folks Know Them: one can obtain reasonable good support and advice from PM and google searches
<li> Solid Code: the modules are well written
<li> OO: Nice structure allows easy overloading and extension
</ul>
I would suggest the weaknesses of LWP and WWW::Mechanize are:
<ul>
<li> Non-Intuitive Interface: the human wants to use the metaphor of how a human browses the web -- fill out that box, click this button, click that link. LWP and WWW::Mechanize makes the coder think in tems of forms
(which fields live in which forms, the true names (vs. the labels) of fields and buttons, etc.) A different metaphor, less WYSIWYG.
<li> Checkboxes and Pulldowns: Setting check boxes and pull-down menus with multiple values is not simple.
<li> No Browser To Watch. While debugging, you have set up your own mechanism to save pages, to see why your code fails
<li> Hard For Beginners: one must to absorb a good deal of documentation (LWP; LWP::UserAgent; HTTP::Request; HTML::Form, etc) to get a reasonably complex app working
<li> Speed: WWW::Mechanize seems slow to me, compared to IE
<li> HTTPS: Requires futzing with SSLEasy, and sometimes causes problems
</ul>
I have been experimenting with an app that interfaces with a website: it needs to log in, redirect to a secure site, examine the status of some pages, post multi-page forms full of hidden cookies and javascript, and repeat a handful of times.
<p>
After some struggles with LWP and WWW::Mechanize, I finally decided to try OLE.<p>
I thought "surely OLE will break, or be slower, or be harder to implement." <p> I was pleasantly surprised: for my needs on this project, OLE was easier. Again, <em> Your Mileage May Vary.</em>
<p>
I used [http://samie.sourceforge.net/] to get me started.
<p>
I'd suggest the strengths of OLE for IE (through SAMIE) are:
<ul>
<li> Intuitive Interface: Fill out a box, click a button, follow a link. Less need to deep-dive into the page source.
<li> A Browser To Watch. Set the <code> $IE->{visible} = 1 </code>, add some time-delays, and find problems by watching.
<li> Speed: OLE ran quite speedily for me
<li> HTTPS & Cookies: Seamless -- IE handles it
</ul>
And the weaknesses of SAMIE:
<ul>
<li> Redmond: Requires Win* and IE. Enough said.
<li> Overkill: I read a post somewhere noting "instantiating IE to fetch a webpage is like driving your Hummer 30 feet to the end of your driveway to pick up the newspaper."
<li> Solidity. I have no data (yet) to support this concern, but I suspect IE/OLE/SAMIE will crump if banged on too quickly or too hard or too many times.
<lI>All Details Are Hidden: you a running a browser --everything under the hood (cookies, redirects, etc) is invisible
<li> Docs: weak documentation
<li> Few Folks Know It: less support from the community. Many google searches for "OLE IE object model" or "OLE IE API" lead to posts that just carp, "Jeepers -- isn't it hard to find docs for OLE and IE?". Docs on the MS site are hard to find or outdated.
<li> Code: SAMIE has a few bugs, I think. The code logic is deeply nested and it appears certain branches might not have been thoroughly tested.
<li> Procedural: Subroutines and deeply nests "if"s... I prefer clear OO myself.
</ul>
<h3> Summary</h3>
Perl is about using the right tool the job. <p> For quick page fetches, I'd use LWP. For simple web apps, I'd use WWW::Mechanize. For testing redirectors or lower-level code, I'd use LWP (so as to be able to see exactly what is going on). For interfacing with a complex multipage secure form quickly on a Win* platform, I'd now suggest considering OLE.
<p>
[rkg]
<p>
I found the following links of some help:
<ul>
<li>[http://samie.sourceforge.net/]
<li>[http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp]
<li>[http://xrl.us/uww|groups.google.com]
<li>[http://xrl.us/uwx|groups.google.com]
</ul>
<p/>
<small><b>update</b> (broquaint): shortened width-bursting URLs</small>