Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Bitwise File Shredding

by jdklueber (Beadle)
on Apr 22, 2004 at 16:23 UTC ( #347398=CUFP: print w/replies, xml ) Need Help??

An idea has been muddling around in my head for a couple of years now and I have finally gotten around to writing it out. This is a second draft, and there are many improvements which can be made to it, but I've proven that the idea is technically feasible and potentially useable.

The module is called "Shredder". It is a file disintegrator/reconstitutor.

It take a source file, byte by byte, and spreads the bits in a predictable pattern over eight other files. These files, taken individually, contain a part of the whole, but in "subatomic" form- these individual shreds give no clues as to the nature of the whole. The program can also take eight shreds and reform them into a copy of the original file.

When combined with encryption (eg, encrypt the source file, shred it, and then encrypt the shreds), this should yield a much more secure way of storing data. (One merely needs enough seperate repositories to keep the shreds in.)

Anyway, review, use, test, alter, and suggest. I'm listening. :-)



#!/usr/local/bin/perl ###################################################################### # # A data protection tool ###################################################################### package Shredder; use strict; use warnings; sub new { my $class = shift; my $config = shift; my $self = {}; bless $self, $class; return $self; } sub Shred { my $self = shift; my $inval = shift; my $len = length ($inval); my $fmt = "b*"; my @bits; my @tmpbits; my @return; @bits = split //, unpack $fmt, $inval; my $i = 0; my $offset = 0; foreach my $bit (@bits) { push @{$tmpbits[$i]}, $bit; $i++; if ($i == 8) { $i = 0; } if ($i == $offset) { $offset ++; if ($offset == 8) { $offset = 0; } $i = $offset; } } $i = 0; $len = scalar (@{$tmpbits[0]}); $fmt = "LLb*"; foreach my $arBits (@tmpbits) { my $chk = join '', @$arBits; if (length($chk)%8) { my $pad = 8 - (length($chk) % 8); $chk .= '0' x $pad; } $return[$i] = pack $fmt, $i, $len, $chk; $i++; } return @return; } sub Tape { my $self = shift; my @invals = @_; my $fmt = "LLb*"; my $returnstr = ""; my @tmpBits; my $len = 0; foreach my $val (@invals) { my ($fileno, $tlen, $bits) = unpack $fmt, $val; my @arbits = split //, $bits; if (!$len) { $len = $tlen; } if ($len != $tlen) { die "Length reported in $fileno doesn't match prior reports!\ +n"; } if ($tmpBits[$fileno]) { die "You've already given me a $fileno value!\n"; } @{$tmpBits[$fileno]} = @arbits; } my $offset = 0; for (my $i = 0; $i < $len; $i ++) { my $b = $offset; for (my $n = 0; $n < 8; $n ++) { $returnstr .= $tmpBits[$b][$i]; $b++; if ($b == 8) { $b = 0; } } $offset ++; if ($offset == 8) { $offset = 0; } } $returnstr = pack "b*", $returnstr; return $returnstr; } sub ShredFile { my ($self, $testfile) = @_; my $tststr = ""; open (IN, "<$testfile"); binmode IN; my $mode = ">"; my $processed = 0; while (read (IN, $tststr, 102400)) { my $written; my @outs = $self->Shred($tststr); for (my $i = 0; $i < 8; $i ++) { open (OUT, "$mode$testfile.$i"); binmode OUT; $written += syswrite OUT, $outs[$i], length($outs[$i]); close OUT; $mode = ">>"; } $processed += $written; print "$processed bytes shredded...\n"; } print "Done!\n"; } sub TapeFile { my ($self, $testfile) = @_; my @reads; my @fh; my $mode = ">"; my $processed = 0; for (my $i = 0; $i < 8; $i ++) { open ($fh[$i], "<$testfile.$i") || die "OUCH: $testfile.$i: $!\ +n"; binmode $fh[$i]; } open (OUT, ">$testfile.out"); binmode OUT; while (read ($fh[0], $reads[0], 12808)) { my $written = 0; for (my $i = 1; $i < 8; $i ++) { read ($fh[$i], $reads[$i], 12808); } my $outval = $self->Tape(@reads); $written += syswrite OUT, $outval, length($outval); $processed += $written; print "$processed bytes taped...\n"; } close OUT; print "Done!\n"; for (my $i = 1; $i < 8; $i ++) { close $fh[$i]; } } =pod =head1 NAME Shredder =head1 SYNOPSIS use Shredder; my $s = new Shredder; $s->ShredFile("filename.txt"); $s->TapeFile("filename.txt"); =head1 DESCRIPTION Shredder is a novel concept in data protection tools. While Shredder +IS NOT ENCRYPTION (doesn't claim to be, never has, never will), it is intende +d to be used concurrently with encryption tools to build a stronger, more airt +ight data protection scheme. Shredder takes a file and "shreds" it down to 8 files, each roughly on +e eighth the size of the original. These shreds are built by taking each byte +of the source file and splitting it bitwise out to the eight different files. (Padding out with 0 bits where needed to make a full byte.) These shredstrings, along with a couple of control characters, are then writ +ten out to eight different files. Since (particularly with ASCII text files) +some bits are used less frequently then others, the bits are round robined betwe +en the files, for example: SOURCE: 0110011011100110 bit 1, byte 1 -->F0 00 F1 11<-- bit 1, byte 2 F2 11 F3 01 F4 00 F5 10 F6 11 F7 01 And so on. Why is this useful? If one distributes these shreds out to seperate repositories, no one will be able to reconstitute the original file wi +thout having collected enough of the shred files to "guess" the missing bits +. The attacker's problem will be compounded if one encrypts the source f +ile, and then encrypts the shreds. If the source file and shreds are cleartext, of course, the attacker's + problem is simple: Collect the files, and reconstitute (or Tape) them back to +gether (possibly inserting "guess" bits based on knowledge of the data in pla +ce of shred files.) If the source file is encrypted but not shredded, the a +ttacker "simply" has to break the encryption. By providing a means of splitti +ng the file at a "subatomic" level, I intend to increase the effectiveness of + standard data protection techniques. =head1 USAGE Simply call Shredfile and Tapefile with a filename to disintegrate or reconstitute a file. Shredfile creates shred files named filename.0 - filename.8; Tapefile assumes the existence of filename.0 - filename.8 +and creates filename.out (for safety.) One can also pass a string to Shred and get back an array of eight str +ings, which when passed back to Tape will reconstitute the original string. + Thus, you can build your own Shred and Tape tools. =head1 TODO Make the process quicker. It's abyssmally slow right now for large fi +les. Probably, this could be improved by rewriting portions in C, but I hav +en't had time yet. =head1 AUTHOR Jason Klueber Copyright 2004 This program is free software. You can modify or distribute it under +the same conditions as Perl itself. =cut 1;
Jason Klueber


Replies are listed 'Best First'.
Re: Bitwise File Shredding
by blokhead (Monsignor) on Apr 23, 2004 at 04:50 UTC
    From an information theory perspective, splitting the file isn't helping as much as you think, since each chunk is only 1/8th the size of the original file. In your scheme, each chunk gives some information about the source file. In fact, it gives exactly 1/8th of the possible information of the source file.

    An area of cryptography called threshold secret sharing gives methods for dividing a "secret" into chunks distributed among n players such that any (t-1) or fewer pieces together give no information about the secret (they constitute entirely random information), but any t pieces together can reassemble the secret. In order for this to happen, the tth chunk had to contain as much information as the entire secret, so it had to be at least as large.

    Secret sharing schemes are most often used for distributed signatures. A committee has a public key and shares the corresponding private key using threshold secret sharing, so that official statements can't be cryptographically signed unless more than half of the committee members agree to sign.

    So in your case, encrypting the source, splitting, and then encrypting the pieces may help the case, but it's shaky ground from a theoretical standpoint.. You're probably no better off than just encrypting the source file without splitting it. If you really want to gain theoretical ground by splitting the file up, you should look into existing secret sharing schemes (wikipedia). With these, there's little need to encrypt the chunks, and (physically) separating the chunks has really helped.


      In order for this to happen, the tth chunk had to contain as much information as the entire secret, so it had to be at least as large.
      Excuse me if I'm wrong, but I thought the idea was that any t chunks could be combined to restore the original, and that all chunks were similar, i.e., same size, information content, etc.

      But perhaps we are in violent agreement?

      Quantum Mechanics: The dreams stuff is made of

        Right, that's exactly the point, although I should have made it more explicit. The last chunk must contain as much information as the entire original message. But since any chunk could be the "last chunk," all chunks have to be at least as large as the original message.

        But they needn't be the same size (well, depending on how you look at it). In some RSA threshold signatures, the secret key d is split into random integers within a range of {-A, ..., A} (for some A much bigger than the valid range of d) so that all the shares add up to d. Some shares may certainly be much smaller than others, and you could store them in fewer bits. But the fact that each key could be as large as A means you have no information about the secret key by knowing all but one share -- the last share could be large enough that adding it onto the current sum can yield every valid choice of d with equal probability.

        However, if a participant publicly announced that his share of the secret could be stored in a very small number of bits, you may be able to get information about the secret if you have all the shares but his -- you may know that the secret d must lie in a smaller range of valid choices.


Re: Bitwise File Shredding
by mutated (Monk) on Apr 23, 2004 at 12:56 UTC
    This potentially has some uses in some very rare circumstances, I would think for the most part, encrypting a document, and then shredding the key and giving each piece to a seperate individual would be more usefull. If you trust your crypto methode (ie use RSA 3DES or something else well published) then as long as your key generation algorithm is good you can publish your encrypted document in a public place, when event X happens (ie you die), everyone with a piece of your key gets together and unencrypts the doc.

Re: Bitwise File Shredding
by Ryszard (Priest) on Apr 23, 2004 at 13:08 UTC
Re: Bitwise File Shredding (late response)
by Madams (Monk) on Aug 16, 2004 at 23:32 UTC
    Hey jdkleuber:

    Late responses are better than none!

    A VERY good reference for crypto stuff is Bruce Schneier's "Applied Cryptography" it even includes actual 'vetted' source code.

    Good luck and have fun!\.spamtrap\.//;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://347398]
Approved by ybiC
Front-paged by broquaint
[erix]: if all else fails, eat the paper
[erix]: a few years ago there was suddenly the news that the Kremlin was using typewriters again. Heard nothing about it afterward
[erix]: (maybe that means it worked)
[oiskuu]: Yeah, it might work as long as there are no root exploits. ;-)
[Corion]: Also, it's much harder to leak paper sheets than it is to leak documents that are available electronically
[oiskuu]: tye, do you make use of remote logging?
[oiskuu]: erix, this gives a new definition to e-paper: something that records the pressure imprints and is bluetooh capable, looks like ordinary paper.
[oiskuu]: So to be really, really sure, you must microwave the paper before you type on it.

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2017-06-23 20:24 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (555 votes). Check out past polls.