Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Quest: a bulletproof-secure, automated scraper

by cowboy (Friar)
on Mar 19, 2005 at 05:17 UTC ( #440872=note: print w/replies, xml ) Need Help??


in reply to Quest: a bulletproof-secure, automated scraper

Using my bank, https://www.scotiaonline.scotiabank.com/ I think I could manage this easy enough. It requires entering my card number, password (in oddly named fields, which change every time you visit, probably to defeat browser caching.. they seem security concious). Submit the form, it gives me some sorta session, redirects me once or twice, then shows my info. A scrape of that screen, would tell me all I needed to know (unless something was out of wack, then I'd check the odd seeming accounts transaction list) So, in summary, what you'd need to do to access my bank:
  • Contact the site, find the form fields, store the cookies. Replace certain form values with card/pass, leave the rest alone, but note them since you'll need to send them.
  • Know that the first field is card number, second is password.
  • Send a post. (with the proper info)
  • read/accept/submit all cookies through the 2-3 redirects it does.
  • scrape the page for the data you want.

    Then again, I am kinda glad my bank seems to take security seriously, and it would be difficult to 'scrape' automatically. If it was easy to scrape, it'd be easy to do all sorts of things

    Then again, bank of america, seems to use a static field to login, it should be fairly easy to deal with something like that automatically. It should be fairly easy for the less scrupulous people to break in as well, since all they have to do is get into your machine, and check your browsers auto-complete data.
    • Comment on Re: Quest: a bulletproof-secure, automated scraper
  • Replies are listed 'Best First'.
    Re^2: Quest: a bulletproof-secure, automated scraper
    by cLive ;-) (Prior) on Mar 19, 2005 at 06:21 UTC

      bank of america, seems to use a static field to login ... should be fairly easy for the less scrupulous people to break in as well, since all they have to do is get into your machine, and check your browsers auto-complete data.

      Actually, they don't cache the password, even though it's static. I was just examining the JavaScript on the page trying to work this one out. I checked my saved form fields in firefox, and they do save my user id in full, but that's all. I haven't looked at it in detail, but my guess is they generate the password field that is submitted using JS and then that doesn't get cached because the browser doesn't "see" it. Though, this is only a guess (I only spent 5 minutes browsing code and didn't go as far as to grab the JS source.

      B of A has a very neat online banking UI. Being in this area a bit myself, I'm continually impressed as to how well it performs across several browsers / platforms.

      cLive ;-)

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Domain Nodelet?
    Node Status?
    node history
    Node Type: note [id://440872]
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others taking refuge in the Monastery: (5)
    As of 2022-08-12 15:07 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?