Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reversible per-line "encryption"

by bmcatt (Friar)
on Jan 25, 2002 at 20:08 UTC ( [id://141520]=perlquestion: print w/replies, xml ) Need Help??

bmcatt has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks,

I'm looking for interesting suggestions on how to approach a problem. We have a call center which needs to provide us with a data file containing a list of people's names and addresses (along with other information). However, we're not actually allowed to look at all of the contained data, only those records that actually belong to "us". Each of the lines will be tagged with some identifier indicating "whose" data it is. For all other records, we'll need to strip them and send them out to other recipients.

For example:

ABC|John Smith|1010 Nowhere Lane ABC|George Jones|2038 Someother Road DEF|Paul Google|3827 Road to Nowhere

Now, suppose that we're "ABC". We're allowed to see all records that start with "ABC", but can't see any of the other records. The initial idea was to encrypt each record individually, but use a different key, possibly based on the recipient - so ABC records would be crypted with ABC, DEF records would be crypted with DEF, etc.

The idea is that we could go through and split the file apart without being able to understand the records. Then we send the DEF records to the DEF organization, etc. Meanwhile, we can decrypt our own ABC records using the ABC key.

Using this approach, we would get a file that contained:

ABC|<ABC-crypted John Smith|1010 Nowhere Lane> ABC|<ABC-crypted George Jones|2038 Someother Road> DEF|<DEF-crypted Paul Google|3827 Road to Nowhere>

The questions that occur to me, though, are:

  1. Does anyone have a better idea that matches the requirements? Note that the initial data file must come to us and we must not be able to easily (plain-text) read any records which do not belong to us.
  2. I don't think we need any sort of heavy-duty cryptography module for this, so does anyone have any ideas of quick-and-dirty ciphering that might work for this? (Golfing optional :-)

Replies are listed 'Best First'.
Re: Reversible per-line "encryption"
by Masem (Monsignor) on Jan 25, 2002 at 20:23 UTC
    A better solution, but probably not possible for you given what you've said, is to use authenicating client/server architecture. Otherwise, your encryption schema sounds reasonable. There's more than a fair share of encryption modules for perl, for example, you could use Crypt::OpenPGP which can easily encrypt and decrypt text strings. The program on the call center end should encrypt each line with the key id of the intended recipient, and then you can simply grep through the lines that start with your key, then decrypt them.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      Unfortunately, the call center is a different organization and they have their own systems. We won't be able to provide them a client/server setup (yet - maybe in the future, but that's a ways down the road).

      The tentative plan is to provide them with a script that they can run against the source file to encrypt it and then we process it on our side. That's why I'm hoping for the lightest-weight approach. I'm also puzzling around with some simple reversible ciphers that can be used to do the encoding.

        I sympathize with your problem, to wit, getting a data feed from a third party over which you have little or no control.

        "The tentative plan is to provide them with a script that they can run against the source file to encrypt it and then we process it on our side."

        You'd be asking for trouble if your script presumed any particular version of Perl or the presence of any non-standard (i.e., don't come with Perl) modules. You'll probably have enough trouble just getting them to run a script at all.

        Good Luck!

        Update: Oops! Forgot to mention, if you're planning to use crypt, you should be on the lookout for architecture- or OS-dependencies in its output.

        dmm

Re: Reversible per-line "encryption"
by talexb (Chancellor) on Jan 25, 2002 at 20:26 UTC
    Just stating the obvious, you probably want to make sure that the encrypted data doesn't contain end-of-line characters or your delimiter | anywhere.

    --t. alex

    "Of course, you realize that this means war." -- Bugs Bunny.

      That really shouldn't make any difference. What he'd essentially have is a two element row that the second element of which requires further processing.

      my( $group, $data ) = split( /\|/, $_, 2 ); if( user_can_see( $group ) ) { my @data = split( /\|/, decrypt( $group, $data ) ); process_row( @data ); }
Re: Reversible per-line "encryption"
by xtype (Deacon) on Jan 26, 2002 at 00:46 UTC
    Hrmmmm… Seems to me that it should be the datacenter who is encrypting the file and then providing you with the means to decrypt ONLY what you should see. Right?
    In which case the other answers are good.
    However, since you have to trust them to be able to run and install whatever you hand them, better keep it real simple. The following might be really lame but at least it will run on most versions of perl without any effort. After all, you did say "we must not be able to easily (plain-text) read any records which do not belong to us."
    Depends on what you consider to be "easy" and "plain-text" I guess.

    To encrypt and decrypt strings in an super low-security fashion:
    #!/usr/bin/perl -w use strict; ## their side before sending the file to you... my $line = "ABC|John Smith 1234"; my ($groupID, $text) = split(/\|/, $line, 2); $text =~ tr/a-z/f-za-e/; $text =~ tr/A-Z/I-ZA-H/; $text =~ tr/0-9/4-90-3/; $line = $groupID . "\|" . $text; print "$line\n"; ## And then on your end... my $myGroup = "ABC"; my ($group, $data) = split(/\|/, $line, 2); if($group eq $myGroup) { $data =~ tr/f-za-e/a-z/; $data =~ tr/I-ZA-H/A-Z/; $data =~ tr/4-90-3/0-9/; } print "$data\n";
    Of course you/they would have to create different translations for each different group (perhaps trade digits with chars and uc with lc, etc.), and then provide the reverse translation to that group.

    No, do not laugh… I know, I know, if someone really wanted to... They would just sit down and stare at the file for a few minutes.
    Therefore, I am making the assumption that if "the datacenter" really cared they would come up with something better.

    -xtype
      Yeah, this is probably something along the lines of what we'll need to do.

      Unfortunately, it's not a question of "the datacenter". Rather, it's a separate company that's providing data to us (and then we will have sub-contracts with other companies to do some of the servicing, but not all). We need to be able to demonstrate (should anyone ever ask) that the data we're providing to our subs is not plain-text readable by anyone here.

      We don't have to say that we couldn't read it if we wanted to. We just say that we've got a Standard Operating Procedure (SOP) that says that we don't decrypt data which isn't for us. Hence, we're not particularly concerned with "stare-ability" - just the ciphering.

      Btw, that was why I had the (subtle) invitation to golf a ciphering/deciphering algorithm which allowed for different "salts".

        Non the less, you would think it up to the person providing the data.

        Btw, that was why I had the (subtle) invitation to golf a ciphering/deciphering algorithm which allowed for different "salts".

        Never used golf before… but something else that might work is to use the Xor operator. If you Xor something twice you get the original value.
        my $passphrase = "I love group ABC"; … $text ^= $passphrase; $line = $groupID . "\|" . $text; print "$line\n"; … if($group eq $myGroup) { $data ^= $passphrase; } print "$data\n";
        Read that in a Steven Holznar book ...I think.
        update: Although that may be less obscure than my first idea.
Re: Reversible per-line "encryption"
by n3dst4 (Scribe) on Jan 26, 2002 at 17:43 UTC
    It does seem bizarre to me that you're expected to provide their security. From a legal point of view or a security point of view, they will have to validate whatever system you provide. If they don't, then for all they know you're just ROT13ing the data. They really ought to have a disinterested third party to set this up.

    Anyhoo, business and suits being what they are I imagine that common sense and good practices come second to MUNNY so you're going to have to provide them with an answer anyway. Platform differences are going to be an issue, but the truth is that any "encryption" you write yourself in the script just isn't going to cut the mustard, especially if some data is leaked and there's any suspicion of your organisation. I regret that you *are* going to have to get them to install at least one module. I recommend Crypt::Blowfish_PP which is a Pure Perl (i.e. no native libraries requred) implementation of the Blowfish algorithm, which provides symmetric encryption. It should run on any OS. I haven't used it myself so I can't speak for it's quality, but if it's a faithful implementation of Blowfish you're off to a good start. With a little hacking you might actually be able to copy the code into your script to avoid the installation issues.

    You have one advantage here in that symmetric encryption is adequate. It doesn't matter that the source knows your key, as long as they have a clue about security and don't let it out (actually, judging by the way they're running their security I'm not so sure that's the case).

    And may I please just scare you out of basing the encryption keys on the organisations' names. Please just don't. It makes my ears bleed :-)

    Update: Thanks to xtype for a much-needed LART after I linked to DBIx-Tree there. Heck knows how I did that. I wasn't even looking at that module! Is the monastery haunted?
      Naturally I agree with n3dst4, and I would never use any of my above examples on anything even remotely important, being something of a security minded fellow myself. However, I noted bmcatt’s reply to Masem’s good answer and re-read the last portion of his original thread. Taking the understanding that no one really cared if it was "encrypted" or "secure" in any particular way, just so that it was not out right readable.
      I am sure everyone’s ears were already bleeding at the idea of the wrong people having to come up with a way to vale data that is not directly their responsibility to cover, and at the thought of it not being particularly secure if it ever is "leaked". Probably bmcatt more than anyone.


      By the way, I really like Blowfish, my security ventures having OpenBSD and HP-UX roots.
      Although, I think that the correct links would be:
      Crypt::Blowfish_PP
      and
      Crypt::Blowfish_PP
      Crypt::Blowfish
      I have never have used the PP implementation, however I can vouch for Blowfish.pm and blowfish in general.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://141520]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-25 17:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found