Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Cryptology in the database

by patspam (Sexton)
on Mar 31, 2008 at 05:03 UTC ( #677448=perlquestion: print w/replies, xml ) Need Help??

patspam has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

Pity my CPAN/Google-fu, but I'm finding Perl database encryption resources to be very few and far between. Can any monks shine some light here?

By database encryption I'm talking about the use of symmetric keys to store encrypted data in a db (not ssl communication with a db). I've been reading Kevin Kenan's book "Cryptology in the database" (Symantec Press) which presents a fairly thorough architecture (including Cryptographic Engine, Key Vault/Manifest/Manager, Crypto Provider/Consumer) with a sample implementation in Java.

I understand that a local software-based key store is going to have to essentially rely on obfuscation to protect the master encryption key (Kenan advocates the use of key-encrypting keys split into at least 2 files to makes things at least non-trivial for an attacker) so I suppose implementing a cryptosystem architecture in code (as opposed to in an external tamper-proof hardware device) isn't all that appealing - still I'm surprised the CPAN doesn't contain any attempts at implementing a working system..? (or maybe I just haven't found it yet..)

I'm sure many of the monks here have worked on projects where the database required encryption (medical info, credit card details, etc..) so I'd really love to hear how you approached the problem.


Patrick Donelan

Replies are listed 'Best First'.
Re: Cryptology in the database
by andreas1234567 (Vicar) on Mar 31, 2008 at 06:31 UTC
    Before diving into algorithms and key management, one should ask What to you want to protect against? Some challenges are
    • Direct access to file and log file viewing/tampering.
    • System user privilege abuse.
    • Stolen or lost media.
    At work I face some of these requirements and I find it hard to come up with a satisfying solution.

    There are plenty of articles on database encryption, e.g. Encrypting Data Values in DB2 Universal Database ( which describes using Column level encryption in the DB2 database. While an interesting read, the article does not touch on key management. The question of where do we store the keys remain unanswered.

    I recommend reading the Payment Card Industry Data Security Standard Specification ( The PCI DSS Specification outlines a series of principles on how financial institutions are to protect financial data (credit card details etc). Again, there is no definitive implementation, but some of the ideas behind it are interesting (from section 3):

    • Keep cardholder data storage to a minimum.
    • Do not store sensitive authentication data subsequent to authorization (even if encrypted).
    • Render [cardnumber], at minimum, unreadable anywhere it is stored (including data on portable digital media, backup media, in logs, and data received from or stored by wireless networks).
    I find the idea of not storing sensitive data unless it's absolutely necessary particularly interesting.
      Thanks for the reply Andreas.

      You're right in that I will probably want to use column level encryption to apply encryption to the data, Kenan's book covers the different strategies (key families, key scope, striping etc..) and the article you linked too looks like interesting reading for an easy way to do it in db2.

      The problem I'm struggling with is where to store the keys. It seems to me that if someone is skilled enough to break into my db server to take a copy of the database (this is what I want to protect against) then chances are they're also skilled enough to break into my application server (which is actually currently the same machine) to view my perl source code to un-obfuscate the encryption key. So encryption doesn't seem to give me any extra level of security at all :(

      I suppose the problem is slightly more apparent in perl than in a language like Java because the source code is easily viewable on the server as source, but compiled code can still be reverse engineered..

      Maybe this is why it doesn't exist on the CPAN? Is it a lost cause?


        "break into my db server" is rather vague. How would the attacker do that? You need to look at more specific attacks (for example, "tricking the database into returning data is shouldn't" and "access to arbitrary files"), calculate the chance of the attack happening, the cost of successful attack (not just financial), the costs of the possible counter-measures (again, not just financial) and the effectiveness of the possible counter-measures.

        The most likely source of leaks is an SQL injection vulnerability, and encrypting the database won't help protect you from that at all since you'll happily decrypt the returned data for the attacker.

        The problem I'm struggling with is where to store the keys.
        Yes, that's the hard part. One solution could be to not store the keys on disk at all. Rather, supply them as arguments when you start your application. That way the keys are stored in memory only (and possibly also cached to disk (swap), but that's another story). An attacker would then have to gain access to your application's memory in order to access your data. I assume that to access content in memory would be considerably harder than to access content on disk.
      I find the idea of not storing sensitive data unless it's absolutely necessary particularly interesting.

      It's a very good one.

      Unfortunately it's often an uphill batle to get acceptance for not storing a lot of 'nice to have' data that's not really neccesary to keep and that greatly increase the complexity of the application.

      Beeing able to conjure som estimates on the cost (not just economic) of adding each table/field sell better with management than just complaining though. Remember to apply π2 to your first idea when you think of a number. Add prime time news headlines to the picture when it's security related.

        Add prime time news headlines to the picture when it's security related.
        Possibly the ugliest example so far is the TJX data breach (45.6M card numbers stolen). Then there's the UK HM Revenue and Customs lost computer disks (25M confidential child benefit details lost). The list grows quickly.
Re: Cryptology in the database
by sundialsvc4 (Abbot) on Mar 31, 2008 at 18:37 UTC

    When you're dealing with crypto, you should be using a public key system, and you should not be implementing any part of that encryption yourself. “It's already been done,” and done well, by systems such as OpenSSL, or by the Crypto-API of Windows. There are copious CPAN interfaces to those systems. You want to be certain that you have left as little as possible to chance.

    You will need to have rigorously-defined access control and change control for your systems and all source-code associated therewith.

    The first thing you should decide is whether or not you actually need to store credit-card information. PayPal™ and other similar vendors now provide schemes that may make it possible for you to have to handle the confidential information at all.

    Next, you need to use public-key encryption, so that the process that's entering new records or handling them in any “outward-facing” way provably-cannot recover the information. If you need to identify a card to yourself, use a SHA1 hash with salt (also a service provided by OpenSSL). If you need to identify it to the user, provide an acceptably very-short fragment or allow the user to enter a nickname.

    Decryption of the data should be a task performed by the card-processing engine ... which should be entirely separate from anything “outward-facing” and completely beyond its control.

    • Naturally, the private-key can only be reached from the card-processing computer, and naturally, it is stored in a password-protected file with the tightest security that your operating-system can provide.
    • If you have to send information to it through an RPC-call mechanism, design it so that the entire request-envelope must be encrypted using its public-key (OpenSSL again). Any requests not so encrypted will be rejected in the most bland fashion possible (and logged to the heavens above!).
    • The response to indicate an approved request should once-again be uninformative... such as returning a random integer supplied as part of the encrypted request; or maybe, one of two... one for "yes" and the other for "no." Get creative.

    Some credit-processors are now providing their business customers (that would be “you” ...) with SSL public-keys that they require you to use when sending requests to them, so that every request they accept is both secure and traceable (to you). This is a good feature.

    The weakest link in any crypto system is always located between two ears. Plan accordingly.

      Thanks for the really detailed reply sundialsvc4, much appreciated. I'm actually dealing with medical data which needs to be stored AND retrieved by the webapp, so sadly I can't implement that scheme :(
Re: Cryptology in the database
by moritz (Cardinal) on Mar 31, 2008 at 14:53 UTC

    There's no pre-built solution on CPAN because there are two possible cases:

    1. You have a storage location that is more secure than your database
    2. You don't have such a location

    In case 1) you can just store the keys there, the rest is a SMOP (small matter of programming)

    In case 2) you're lost anyway. Even if you obscure the keys in a very clever way, you'll still have code that reverses that process (otherwise you couldn't access the keys).

    Now if somebody has access to your database, he will probably have access to your code as well, make a copy of it, and dump the keys after the deobfuscation.

    So anything that is in the case 2) just gives a false sense of security, and is IMHO not worth considering.

      there are two possible cases.
      I don't find it all black or white. Consider an application server with a symmetric key stored in plain text on disk, connected to a database server which performs symmetric encryption on the data. Although the protection against an online attacker having filesystem access on the application server is very poor, it will still protect well against offline attacks on lost or stolen database disks.
Re: Cryptology in the database
by derby (Abbot) on Mar 31, 2008 at 13:41 UTC

    Encrypting cc numbers in a database (I know, a really bad idea but one I had to live with -- thankfully we've stopped) was the reason I wrote Getting Started with GnuPG and GPG.

Re: Cryptology in the database
by ack (Deacon) on Mar 31, 2008 at 15:14 UTC

    Just thinking out loud here.

    But could you store the key(s) in a C-coded & compiled routine so that it is, at least, not just 'obfuscated'?

    Or, perhaps even better, use a compiled c-coded and compiled encryptor to encrypt the keys and then store them in file that only the c-coded & compiled subroutine(s) knows about?

    ack Albuquerque, NM
Re: Cryptology in the database
by jethro (Monsignor) on Apr 01, 2008 at 05:14 UTC
    Well, if all you want is slightly more security, there might be a hardware-based solution that's not that hard to make. The idea is to build a small storage readable from the parallel port, which is only readable after startup. As soon as a special pin on the parallel port is toggled once, that storage isn't accessible anymore (until the next reboot of the machine).

    Anyone versed in electronics could build something like this with a relais and some diodes, and an eprom for storage. The trick is that the relais is off after a power down or a reset. When a special pin is toggled, the relais gets turned on and then holds itself on. If the relais is also connected to an adress pin of the eprom, the previous accessible adress range is changed irrevocably

    A similar idea without a relais: The eproms lower address pins are not directly connected to the parallel port pins, but to a counter circuit that can only count upwards and does not overflow to 0 (or the highest bit just never turns to 0 after it changed to 1).

    Or you get a small microcontroller (for example a PIC) to do that.

    You also need a compiled (C or C++) wrapper around your database that does the key retrieval and database decoding with lots of obfuscation and checksumming of the wrapper so that the attacker can't easily patch the code after decompiling it.

    This scheme is not much more than an obfuscation. If the attacker knows what's going on, he will just replace the perl code to make a complete dump of the database and reboot the machine. But to find that out he probably needs to do a lot of testing and a few risky revisits to your machine. Not just an easy hit and run.

    The hardware is necessary so that the attacker can't just copy your hard disk and search for the key in his remote copy. The compiled decoding wrapper is necessary so that he can't just read the program, find the $key and add one line system("echo $key > /tmp/foo"); on his next visit.

    The security you get is not much, but it's the most you can hope for against an attacker with root access. But the time you would have to invest to build this system is considerable. So entering the password at every reboot might be inconvenient, but it gives you the same security with a lot less work.

    PS: If you can't deploy a second machine to act as independent database server, why not split your one machine into two virtual ones?

      Hats off, that's a pretty cool approach. My server is actually a virtual machine which is why the hardware security module (HSM) approach (really expensive out-of-the-box version of what you detailed) is out. Sorry to keep shutting down these great ideas with extra info that I should have provided in my inital post.

      I guess the fact that my server is a VM means that my security is already pretty weak, but assuming I'm willing to trust my virtual server provider (or maybe BECAUSE I'm willing to trust them) there's still an appeal in encrypting private data in the db.

      Your PS has gotten me thinking though, there's certainly nothing stopping me from commissioning a second virtual machine to separate webapp code from db. Now that I think about it, maybe the very best I can achieve in this virtual server situation is to commission a second virtual machine to act as a virtual HSM, that is, to act as the encryption provider/key store etc.. whereby the encryption key(s) never actually leave the vm (unless someone hacks it of course). I obviously don't get the primary benefit of an HSM which is physical tamper-proof security of the key(s), but it at least means that someone has to hack both vms, or just hack the webapp server and hijack the app/communication channel to the virtual HSM to decrypt data online (which, according to my reading, is an unavoidable problem even in a physical HSM and can only really be 'protected' against by monitoring for activity spikes etc..).

      I'm cautiously optimistic about the above approach, but please someone pop my bubble if I've over-looked something obvious


        I don't think I've heard of any break-ins into hypervisors yet. It will eventually happen, but at the moment virtualization seems to be quite safe IMHO.

        Checking for activity spikes is a good idea, another idea would be poisoned data sets, i.e. you add data that you are careful to never access. If it gets accessed, further access is prevented (or simply the decoding key changed so that the attacker still thinks he gets data, but it is unreadable) and you are alerted. This could be implemented maybe with the help of stored procedures or a separate watcher process.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://677448]
Approved by McDarren
Front-paged by andreas1234567
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2023-12-09 15:00 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (38 votes). Check out past polls.