Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Quest: a bulletproof-secure, automated scraper

by tlm (Prior)
on Mar 19, 2005 at 04:39 UTC ( #440866=perlquestion: print w/replies, xml ) Need Help??

tlm has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I want to write a Perl script that will periodically log in to my bank account's web page and report back a summary of information. This script should run securely and without supervision. Naturally, it will need to know sensitive information, such as passwords, account numbers, etc., and I wonder if it possible for a lowly monk wannabe to write a Perl script that keeps this information safe.

The first solution that occurred to me was to rely on a source filter like Filter::Crypto, but the following caveat in that module's man page gives me pause:

This techinique can never completely hide the original unencrypted source code from people sufficiently determined to get it. The most it can hope for is to hide it from casual prying eyes, and to outdo everyone who is using a precompiled Perl (at least from "regular" sources) and everyone who isn't knowledgeable enough to suitably modify the Perl source code before compiling their own.

Perl source code decryption filters work by intercepting the source stream (read from the encrypted file) and modifying it (in this case, decrypting it) before it reaches the Perl parser. Clearly, by the time the source reaches the parser it must be decrypted, otherwise the script cannot be run. This means that every part of the script must at some stage be held in memory in an unencrypted state, so anyone with the appropriate debugging skills will be able to get it.

If Perl was built with DEBUGGING then running the script with the Perl's -Dp command-line option makes this much easier. Even without a DEBUGGING Perl, the script can still be run under the Perl debugger (Perl's -d command-line option), whose l command will list the (decrypted) source code that was fed to the parser.

In fact, with the introduction of the Perl compiler backend modules it is now easy to get at the decrypted source code without any debugging skills at all.

I have thought of other schemes, too outlandish and elaborate to bother you with. I have looked in various sources (e.g. perlsec, book searches in Amazon, Google, etc., but I have found nothing that addresses my concerns.

Given that I'm no Donald Trump, it is probably safe to assume that no one is going to waste much time trying to break into my bank account, but I really don't want to take any chances (plus I am genuinely curious about how one goes about dealing with this problem without cutting any corners).

In case it makes any difference, the OS for which this script is intended is (Debian) Linux.

I look forward to reading your opinions on this.


Update: In retrospect I should have phrased the ending of this question as follows:

My question to you, fellow monks, is, if you wanted to set up such an automated screen scraper to collect data from your bank account, how would you do it? How would you architect the solution? What CPAN modules would you use? What other non-Perl software would you use? What special hardware does your solution require, if any? What books/articles/tutorials/authors would you consult to craft your solution?

I realize that this problem, like any problem, does not have a perfect solution. I do not want a perfect solution, just the best solution that I can achieve.

I also realize that there is a largely unspecified context to my project, and that there are many aspects of this context (e.g. how secure is my computer to attack) that would render any solution to my question pointless. (But this is true of any question posted in SoPW. For example, any reply to a request for an efficient algorithm would be pointless if this algorithm is used in a program that does something horribly inefficient elsewhere. It's the ol' Langsam's Law: Everything depends.) This is why I ask you how you would solve the problem.


the lowliest monk

  • Comment on Quest: a bulletproof-secure, automated scraper

Replies are listed 'Best First'.
Re: Quest: a bulletproof-secure, automated scraper
by jhourcle (Prior) on Mar 19, 2005 at 05:13 UTC

    There is no way to store information in a way that it can be able to be decrypted, without it being also insecure in some fashion. This is one of the big problems with storing SSL private keys -- you either have to have someone key in the passphrase when it starts up, leave it unlocked, or encode the password as plain text (or some way to get it decrypted, and the instructions to decrypt it).

    Your best bet is to have the process run as a daemon, and enter in the information when it starts up. It can still be recovered if someone can force it to core dump, and then they can go through it for the information, but it's about as secure as you're going to get. (well, you could store in memory encrypted, with the information to decode it, but again, someone who really wants the information could get to it.)

    As with anything, all you can do is slow someone down who might gain access -- unlink the script after you've started the process, etc, but it's never going to be perfect. You'll have to decide for yourself how the risks associated with the task compare to the benefits that you might get from it.

    Oh ... and it is possible to hide processes from showing up in the process list (so it's harder for people to find and kill it, especially if it deletes itself when running, so there's no file to associated it with), but then we're getting into the process of how to hide worms and rootkits.

Re: Quest: a bulletproof-secure, automated scraper
by Zaxo (Archbishop) on Mar 19, 2005 at 05:05 UTC

    Security of the kind you want depends on trusting the root user. Run your script from a machine that only you control. Don't run it without an encrypted connection.

    Source obfuscation or encryption is not useful. Anybody who can read or run the script can read the key. Unix file system permissions are a better way to prevent unauthorized access to your keys.

    You may not be da Donald, but that's no protection.

    After Compline,
    Zaxo

      Security of the kind you want depends on trusting the root user. Run your script from a machine that only you control.

      Certainly, I should have mentioned this from the start. I'm the one and only user of the machine in question (and naturally have root access).

      the lowliest monk

Re: Quest: a bulletproof-secure, automated scraper
by webengr (Pilgrim) on Mar 19, 2005 at 05:08 UTC

    This problem seems to me the same wherever secure automation is required. The solution that I am employing in an automated SFTP application is as follows (substitute PGP for SFTP to apply to your problem):

    • decrypt an encrypted key and load it into memory using an agent. In my case the agent is ssh-agent, and in yours it might be gpg-agent. This step requires the pass phrase to be typed in at the console. The fact that it does NOT reside on disk anywhere is what stregthens the security of this approach, so even root couldn't easily get your PIN.
    • use the keychain script (http://www.gentoo.org/proj/en/keychain/index.xml) to allow a batch process to access the key in memory. The process needs a PID and socket name to access the key, and keychain makes this available to processes that have no tty.

    I have had success with this approach in both cygwin and Solaris environments. I think you might research an approach that uses GPG to encrypt your bank PIN, and Perl modules for GPG integration. The pass phrase that I referred to above is for the GPG key, not your bank PIN, so even if someone watched you type it in, they still wouldn't have your PIN.

    This just seemed related to the problem I'm working on, but I haven't tried automating GPG, so I don't know if it would actually work. A key issue in my application comes from SarbOx and how to protect things from a compromised superuser account.

    PCS

      Okay, after re-reading my post I can see that in a GPG scenario root could quite easily get the PIN simply by running a different script to access the gpg-agent. In my SFTP scenario, the thing I am protecting is the ssh key, and that's a lot easier to do.

      Perhaps you can keep your PIN in a database with access controls in place. But a compromised root account can make protecting something on disk extremely difficult.

      I will be watchiing this thread with great interest.

      Update: removed extraneous punctuation

      PCS
Re: Quest: a bulletproof-secure, automated scraper
by polettix (Vicar) on Mar 19, 2005 at 08:13 UTC

    Apart from any technical consideration, this seems more a philosophical one. All modern crypto systems, however strong they may be, make the basic assumption that you only know the secret information that lets the sesame open. Even without knowing them, a powerful and lengthy attack could eventually lead to the desired secret.

    So, the issue is that you have to assess the risk and play with it. It's like you wrote the PIN inside your agenda to be sure not to forget it: how secure is your agenda? Maybe you write all your passwords inside it, and keep it inside the most secure bank - do you really do this?

    If you really want to keep your secret inside the computer more than in your head, you should ask yourself how secure your computer is and how a potential attacker could gain access to it; so, it seems more a "contour problem" to me, that is: how much is your computer exposed?

    Moreover, you should really assess whether a potential attacker could be really interested in losing time to find the secret: if you keep $1000 average dollars in your bank account, is a $1000 (add zeroes at will) attack worth the trouble for h(er|im)? The level of security should be such that an attack would be too expensive with respect the reward; unluckly, this has little to do with Perl, I fear, even if it might help :P

    As a side note, you could afford some kind of compromise keeping the secret (a GPG secret key, for example) always with you with an USB disk, and feed it to a daemon when it's needed. If you spend some time near your server, you could plug the disk when you arrive and unplug it when you go away, keeping it with you all the time or at least keeping it separate from the server. This would make it necessary to set up a physical attack to your premises to have access to the USB disk. Then, you could have some script in the cron table that regularly checks for the presence of the key and does its scraping work; just be sure that the secret remains in memory as little time as possible and does not get swapped on disk!

    Bye, Flavio.

    -- Don't fool yourself.
Re: Quest: a bulletproof-secure, automated scraper
by cowboy (Friar) on Mar 19, 2005 at 05:17 UTC
    Using my bank, https://www.scotiaonline.scotiabank.com/ I think I could manage this easy enough. It requires entering my card number, password (in oddly named fields, which change every time you visit, probably to defeat browser caching.. they seem security concious). Submit the form, it gives me some sorta session, redirects me once or twice, then shows my info. A scrape of that screen, would tell me all I needed to know (unless something was out of wack, then I'd check the odd seeming accounts transaction list) So, in summary, what you'd need to do to access my bank:
  • Contact the site, find the form fields, store the cookies. Replace certain form values with card/pass, leave the rest alone, but note them since you'll need to send them.
  • Know that the first field is card number, second is password.
  • Send a post. (with the proper info)
  • read/accept/submit all cookies through the 2-3 redirects it does.
  • scrape the page for the data you want.

    Then again, I am kinda glad my bank seems to take security seriously, and it would be difficult to 'scrape' automatically. If it was easy to scrape, it'd be easy to do all sorts of things

    Then again, bank of america, seems to use a static field to login, it should be fairly easy to deal with something like that automatically. It should be fairly easy for the less scrupulous people to break in as well, since all they have to do is get into your machine, and check your browsers auto-complete data.

      bank of america, seems to use a static field to login ... should be fairly easy for the less scrupulous people to break in as well, since all they have to do is get into your machine, and check your browsers auto-complete data.

      Actually, they don't cache the password, even though it's static. I was just examining the JavaScript on the page trying to work this one out. I checked my saved form fields in firefox, and they do save my user id in full, but that's all. I haven't looked at it in detail, but my guess is they generate the password field that is submitted using JS and then that doesn't get cached because the browser doesn't "see" it. Though, this is only a guess (I only spent 5 minutes browsing code and didn't go as far as to grab the JS source.

      B of A has a very neat online banking UI. Being in this area a bit myself, I'm continually impressed as to how well it performs across several browsers / platforms.

      cLive ;-)

Re: Quest: a bulletproof-secure, automated scraper
by TedPride (Priest) on Mar 20, 2005 at 00:03 UTC
    Why worry about this? Someone with access to your computer's memory can intercept the unencrypted data, sure, but who's going to be able to get at your computer? Assuming your computer isn't virus infected, has proper virus software and firewalls, and isn't networked to any other computer that might itself be vulnerable, your memory should be safe from anyone without hands-on access to your comp - and if they have hands-on access, you're doomed anyway.

    The vast majority of security breaches are from human engineering (obtaining the "I lost my password" info, or finding a password someone wrote down, or so on), not actual hacking. Keep your virus software up to date, don't network your bank computer to any other computer with internet access, and don't write down your password or allow anyone access to your bank computer.

Re: Quest: a bulletproof-secure, automated scraper
by blueberryCoffee (Scribe) on Mar 20, 2005 at 02:24 UTC
    This is easy. Just make this a perl/tk app that asks for your username and password as it starts up. Don't actually record it in a file. For talking to the bank they use ssl so your app (using lwp ot get the page) won't be any less secure than doing the same thing with a browser.
Re^2: Quest: a bulletproof-secure, automated scraper
by tlm (Prior) on Mar 19, 2005 at 13:14 UTC

    I have made a significant update to my original post.

    the lowliest monk

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://440866]
Approved by Zaxo
Front-paged by bart
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2022-06-26 14:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (85 votes). Check out past polls.

    Notices?