Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

In-browser mech-like thing?

by dgaramond2 (Monk)
on Oct 29, 2010 at 01:48 UTC ( #868201=perlquestion: print w/replies, xml ) Need Help??

dgaramond2 has asked for the wisdom of the Perl Monks concerning the following question:

In our web-based personal finance application, we use WWW::Mechanize to login to financial institution sites and download + scrape users' account statements from HTML pages.

Due to some of these financial institutions' security policy, we need to do the login + downloading from the end user's browser instead of from our servers.

Does anyone know something akin to Mechanize in Javascript? Kind of funny though to think about it, Mechanize's goal is to emulate a browser, and here we have the browser already. We just need to do some automation stuffs on it, i.e. login to a site, click some menu links, submit some form, and send the resulting HTML pages to our servers.

Replies are listed 'Best First'.
Re: In-browser mech-like thing?
by Corion (Pope) on Oct 29, 2010 at 07:20 UTC
Re: In-browser mech-like thing?
by aquarium (Curate) on Oct 29, 2010 at 03:23 UTC
    well that's basically a javascript and not perl question. however, javascript already handles DOM manipulation quite well and you can set timer events to click button or whatever. in other words whilst in server side languages (liek perl) you're dealing with the html document, in javascript you interact directly with the browser document object model of the document and everything else that represents the browser/client environment. it's much easier if you use a library such as jquery. anyway, you're going to run into the problem of same origin policy for your javascript code, and you'll need to read up on this problem and potential workarounds. it's not trivial to work around the same origin restriction, just to let you know.
    the hardest line to type correctly is: stty erase ^H

      Thanks for the answers. The reason I'm asking this in Perlmonks is because I'm not quite sure what I actually need/want in JavaScript.

      We don't want users seeing the financial institution pages, we want to download the account statements in the background (but still using the client's browser). This has led me thinking into loading the banking pages in a hidden IFRAME, but yes, there's the same origin problem (I cannot peek into the DOM of the content of the IFRAME).

      I just read up on cross-domain AJAX and JSONP, but this requires that the banking site returns JSON, which it does not.

      We could setup a proxy on our servers, but this violates the requirement.

Re: In-browser mech-like thing?
by Anonymous Monk on Oct 29, 2010 at 09:36 UTC
      Thanks. I've also thought about Selenium and will keep it as an option. Although we might need to "mask" or "brand" the Selenium Firefox addon as something else to better convince our end users to install it on their browser. And of course there's the issue of other browsers.
Re: In-browser mech-like thing?
by Sinistral (Monsignor) on Oct 29, 2010 at 16:03 UTC

    I've had very good luck with iMacros. It's got multiple browser support (IE, Firefox, Chrome) and both free and commercial versions. The free version still has the scripting engine and can do pretty sophisticated interactions with web pages. It also lets you save results data and capture information.

Re: In-browser mech-like thing?
by Anonymous Monk on Oct 30, 2010 at 23:41 UTC

    You're going to run into a wall called cross site request forgery (xsrf). What your probably going to want to do is make a type of web proxy (something like this already exists though?) that takes the login, scrapes the page, takes more info and scrapes the page, and continues to repeat the process.

    Either way, what you're asking about is a directed man in the middle attack. It might be completely legitimate because of the company policy of where your users work or some other law (I can't think of how or why and I doubt this is legitimate). But that should start you on enough resources to do what you want to do.

      I can't think of a legitimate use for this, having a 3rd party BROWSER addon do automated logins on behalf of users to banks, means that it needs to store login/pass for users.... this is pretty much against most banking regulations AFAIK
        and furthermore, since the entire session (not just the login) is likely to be https, you won't be able to scrape the gibberish. you can automate pressing buttons etc, but the https info sent from the server will not be intelligable, afaik.
        the hardest line to type correctly is: stty erase ^H

        Not necessarily, if the user/pass is stored by the browser addon in-memory for the current browsing session only. The user will then be slightly inconvenienced by having to enter a user/pass in the morning, but during the day as long as it does not close the browser, the browser addon can periodically login+logout on behalf of the user to check for new transactions.

        Also, how is this different from browsers themselves saving login username/password for the user (by explicit consent from the user). Do banks in the US explicitly forbid this browser feature?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://868201]
Approved by aquarium
Front-paged by aquarium
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2021-11-28 02:10 GMT
Find Nodes?
    Voting Booth?

    No recent polls found