Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Preface

Of course, the best way for one machine to talk to another machine over the web is through some machine-sensible protocol: XML, soap, whatever. That being said, there are times when this option isn't available, forcing you to use http(s) to write an app that mimics a browser.

This meditation describes my recent experiments writing such an app. Your Mileage May Vary.

And of course do make sure such automated apps conform with any Terms of Use of the site you're using.

LWP and WWW:Mechanize vs. OLE

There are many posts around web from folks asking "How doe I use perl to mimic a browser", and folks always answer, "use LWP" or "use WWW::Mechanize". Those are astoundingly great modules for many circumstances, but they also have limitations.

I'd suggest the strengths of LWP and WWW::Mechanize are:
  • No Details Are Hidden: you can work with the request in all of its glory at various levels of detail
  • RTFM: decent (not great) documentation
  • Folks Know Them: one can obtain reasonable good support and advice from PM and google searches
  • Solid Code: the modules are well written
  • OO: Nice structure allows easy overloading and extension
I would suggest the weaknesses of LWP and WWW::Mechanize are:
  • Non-Intuitive Interface: the human wants to use the metaphor of how a human browses the web -- fill out that box, click this button, click that link. LWP and WWW::Mechanize makes the coder think in tems of forms (which fields live in which forms, the true names (vs. the labels) of fields and buttons, etc.) A different metaphor, less WYSIWYG.
  • Checkboxes and Pulldowns: Setting check boxes and pull-down menus with multiple values is not simple.
  • No Browser To Watch. While debugging, you have set up your own mechanism to save pages, to see why your code fails
  • Hard For Beginners: one must to absorb a good deal of documentation (LWP; LWP::UserAgent; HTTP::Request; HTML::Form, etc) to get a reasonably complex app working
  • Speed: WWW::Mechanize seems slow to me, compared to IE
  • HTTPS: Requires futzing with SSLEasy, and sometimes causes problems
I have been experimenting with an app that interfaces with a website: it needs to log in, redirect to a secure site, examine the status of some pages, post multi-page forms full of hidden cookies and javascript, and repeat a handful of times.

After some struggles with LWP and WWW::Mechanize, I finally decided to try OLE.

I thought "surely OLE will break, or be slower, or be harder to implement."

I was pleasantly surprised: for my needs on this project, OLE was easier. Again, Your Mileage May Vary.

I used http://samie.sourceforge.net/ to get me started.

I'd suggest the strengths of OLE for IE (through SAMIE) are:

  • Intuitive Interface: Fill out a box, click a button, follow a link. Less need to deep-dive into the page source.
  • A Browser To Watch. Set the  $IE->{visible}   = 1, add some time-delays, and find problems by watching.
  • Speed: OLE ran quite speedily for me
  • HTTPS & Cookies: Seamless -- IE handles it
And the weaknesses of SAMIE:
  • Redmond: Requires Win* and IE. Enough said.
  • Overkill: I read a post somewhere noting "instantiating IE to fetch a webpage is like driving your Hummer 30 feet to the end of your driveway to pick up the newspaper."
  • Solidity. I have no data (yet) to support this concern, but I suspect IE/OLE/SAMIE will crump if banged on too quickly or too hard or too many times.
  • All Details Are Hidden: you a running a browser --everything under the hood (cookies, redirects, etc) is invisible
  • Docs: weak documentation
  • Few Folks Know It: less support from the community. Many google searches for "OLE IE object model" or "OLE IE API" lead to posts that just carp, "Jeepers -- isn't it hard to find docs for OLE and IE?". Docs on the MS site are hard to find or outdated.
  • Code: SAMIE has a few bugs, I think. The code logic is deeply nested and it appears certain branches might not have been thoroughly tested.
  • Procedural: Subroutines and deeply nests "if"s... I prefer clear OO myself.

Summary

Perl is about using the right tool the job.

For quick page fetches, I'd use LWP. For simple web apps, I'd use WWW::Mechanize. For testing redirectors or lower-level code, I'd use LWP (so as to be able to see exactly what is going on). For interfacing with a complex multipage secure form quickly on a Win* platform, I'd now suggest considering OLE.

rkg

I found the following links of some help:

update (broquaint): shortened width-bursting URLs


In reply to On being a browser by rkg

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (7)
    As of 2020-06-05 07:28 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Do you really want to know if there is extraterrestrial life?



      Results (35 votes). Check out past polls.

      Notices?