Preface
Of course, the best way for one machine to talk to another machine over the web is through some machine-sensible protocol: XML, soap, whatever. That being said, there are times when this option isn't available, forcing you to use http(s) to write an app that mimics a browser.This meditation describes my recent experiments writing such an app. Your Mileage May Vary.
And of course do make sure such automated apps conform with any Terms of Use of the site you're using.
LWP and WWW:Mechanize vs. OLE
There are many posts around web from folks asking "How doe I use perl to mimic a browser", and folks always answer, "use LWP" or "use WWW::Mechanize". Those are astoundingly great modules for many circumstances, but they also have limitations.
- No Details Are Hidden: you can work with the request in all of its glory at various levels of detail
- RTFM: decent (not great) documentation
- Folks Know Them: one can obtain reasonable good support and advice from PM and google searches
- Solid Code: the modules are well written
- OO: Nice structure allows easy overloading and extension
- Non-Intuitive Interface: the human wants to use the metaphor of how a human browses the web -- fill out that box, click this button, click that link. LWP and WWW::Mechanize makes the coder think in tems of forms (which fields live in which forms, the true names (vs. the labels) of fields and buttons, etc.) A different metaphor, less WYSIWYG.
- Checkboxes and Pulldowns: Setting check boxes and pull-down menus with multiple values is not simple.
- No Browser To Watch. While debugging, you have set up your own mechanism to save pages, to see why your code fails
- Hard For Beginners: one must to absorb a good deal of documentation (LWP; LWP::UserAgent; HTTP::Request; HTML::Form, etc) to get a reasonably complex app working
- Speed: WWW::Mechanize seems slow to me, compared to IE
- HTTPS: Requires futzing with SSLEasy, and sometimes causes problems
After some struggles with LWP and WWW::Mechanize, I finally decided to try OLE.
I thought "surely OLE will break, or be slower, or be harder to implement."
I was pleasantly surprised: for my needs on this project, OLE was easier. Again, Your Mileage May Vary.
I used http://samie.sourceforge.net/ to get me started.
I'd suggest the strengths of OLE for IE (through SAMIE) are:
- Intuitive Interface: Fill out a box, click a button, follow a link. Less need to deep-dive into the page source.
- A Browser To Watch. Set the $IE->{visible} = 1, add some time-delays, and find problems by watching.
- Speed: OLE ran quite speedily for me
- HTTPS & Cookies: Seamless -- IE handles it
- Redmond: Requires Win* and IE. Enough said.
- Overkill: I read a post somewhere noting "instantiating IE to fetch a webpage is like driving your Hummer 30 feet to the end of your driveway to pick up the newspaper."
- Solidity. I have no data (yet) to support this concern, but I suspect IE/OLE/SAMIE will crump if banged on too quickly or too hard or too many times.
- All Details Are Hidden: you a running a browser --everything under the hood (cookies, redirects, etc) is invisible
- Docs: weak documentation
- Few Folks Know It: less support from the community. Many google searches for "OLE IE object model" or "OLE IE API" lead to posts that just carp, "Jeepers -- isn't it hard to find docs for OLE and IE?". Docs on the MS site are hard to find or outdated.
- Code: SAMIE has a few bugs, I think. The code logic is deeply nested and it appears certain branches might not have been thoroughly tested.
- Procedural: Subroutines and deeply nests "if"s... I prefer clear OO myself.
Summary
Perl is about using the right tool the job.For quick page fetches, I'd use LWP. For simple web apps, I'd use WWW::Mechanize. For testing redirectors or lower-level code, I'd use LWP (so as to be able to see exactly what is going on). For interfacing with a complex multipage secure form quickly on a Win* platform, I'd now suggest considering OLE.
I found the following links of some help:
- http://samie.sourceforge.net/
- http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/reference/dhtml_reference_entry.asp
- groups.google.com
- groups.google.com
update (broquaint): shortened width-bursting URLs
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: On being a browser
by Corion (Patriarch) on Sep 29, 2003 at 11:53 UTC | |
Re: On being a browser
by liz (Monsignor) on Sep 29, 2003 at 11:38 UTC | |
Re: On being a browser
by tomhukins (Curate) on Sep 29, 2003 at 17:58 UTC | |
Re: On being a browser
by atcroft (Abbot) on Sep 29, 2003 at 16:41 UTC | |
by Koschei (Monk) on Dec 03, 2003 at 23:30 UTC | |
Re: On being a browser
by rkg (Hermit) on Sep 30, 2003 at 01:43 UTC | |
Re: On being a browser
by t'mo (Pilgrim) on Sep 30, 2003 at 17:45 UTC | |
Re: On being a browser
by DapperDan (Pilgrim) on Sep 30, 2003 at 10:37 UTC | |
Re: On being a browser
by Jenda (Abbot) on Oct 01, 2003 at 21:15 UTC | |
Re: On being a browser
by henrywasserman (Initiate) on Oct 21, 2003 at 04:29 UTC | |
by henrywasserman (Initiate) on Jan 29, 2005 at 15:00 UTC | |
by Anonymous Monk on Sep 27, 2007 at 01:36 UTC | |
Mechanize docs
by petdance (Parson) on Oct 04, 2003 at 01:59 UTC | |
by rkg (Hermit) on Oct 04, 2003 at 11:30 UTC | |
by Anonymous Monk on Dec 03, 2003 at 18:04 UTC | |
by Koschei (Monk) on Dec 03, 2003 at 23:32 UTC |