Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Compressing and Encrypting files on Windows

by hawtin (Prior)
on Nov 01, 2004 at 09:07 UTC ( #404244=perlquestion: print w/replies, xml ) Need Help??
hawtin has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks: Every so often I come across a difficulty that I think is so common someone else must have a solution, but when I look I cannot find it. Here is one that I am struggling with at the moment.

I have a Perl application that stores sensitive data in an XML file. Which was perfect during development because it made it easy to see what was happening and try out various situations by editing the file. Now that I am moving toward deployment I want to provide "compress" and "encrypt" capabilities to the application user, something like this:

             -->-Compress->--       -->-Encrypt->-| F |
+-----+     /                \     /              | i |
| App |->--?                  ->--?               | l |
+-----+     \                /     \              | e |
             -->---->----->--       -->---->---->-| s |

Which I thought I could implement like this (error handling removed for clarity):

... use MyData; $f = new MyData(file => "foo.compressed",encrypt_key =>""); ... package MyData; sub new { ... my $parser = XML::SAX::ParserFactory->parser( Handler => MyData::Handler->new); my $fh = _file_open($filename,$key); $parser->parse_file($fh); ... } sub _file_open { my($file_name,$encrypt_key) = @_; $encrypt_key = "" if(!defined $encrypt_key); # This first bit works OK my $compress = ""; my $compress = 1 if($file_name =~ /\.compressed$/); if($encrypt_key eq "") { return new IO::Zlib($file_name,"rb") if($compress); return new IO::File($file_name,"r"); } # This is where I get into trouble ????? }

Now if I was running on a *IX machine I could create an encryption process and pipe my (possibly) compressed data to it, but I'm on Windows and I can't get pipe and fork to do anything useful. The files are small enough that I could read them into memory, decrypt them, inflate them and then create an IO::Scalar for SAX, but that would need me to know more about the Compress::Zlib than I currently do.

I am running the ActiveState 5.6.1 on a Windows 2000 machine

This must be a common requirement and I am sure that the only reason the answer is not obvious is my lack of encryption experience. How have others done this?

Update: Of course I also want to do something similar when writing a file

Replies are listed 'Best First'.
Re: Compressing and Encrypting files on Windows
by tachyon (Chancellor) on Nov 01, 2004 at 14:03 UTC

    Doing it in memory is by far the easiest thing. It is also simple with one line calls to decrypt/encrypt and compress/decompress. This uses Crypt::Blowfish but you could use your favourite block cipher. The order is VITAL. plaintext->compress->crypt->file is the only way to go (reverse on the way out of course).

    By definition a decent encryption algorithm will turn '0'x1000000 into random (and thus totally uncompressable noise). On the other hand a compression algorithm will compress the hell out of the plaintext due to the repeated pattern. Then you can crypt that. For interest reverse the order of crypt<->compress and you will see no compression at all. This is a simple measure of the quality of the encryption - the compression algorithm can no longer see any patterns (and we know the plaintext has patterns). Crap encryption WILL compress. Try ROT13, or bongo crypt ;-)

    $|++; use Compress::Zlib ; use Crypt::CBC; my $cipher = Crypt::CBC->new( {'key' => 'secret key', 'cipher' => 'Blo +wfish', }); my $file = 'c:/text.txt'; write_file( $file, $cipher, "abcbefghijklmnopqrstuvwxyz\n"x1000 ); print "\n\nDecrypting and decompressing\n\n"; my $data = read_file( $file, $cipher ); printf "Got back %d bytes\n%s\n[snip]", length($data), substr($data,0, +64); sub write_file { my ( $file, $cipher, $data ) = @_; printf "Before compression %d bytes\n", length($data); $data = compress($data); printf "After compression %d bytes\n", length($data); $data = $cipher->encrypt($data); printf "After encryption %d bytes\n", length($data); open my $fh, '>', $file or die $!; print $fh $data; close $fh; print map{ s/[^\040-\177]//g;$_ }$data; } sub read_file { my ( $file, $cipher ) = @_; open my $fh, $file or die $!; local $/; my $data = <$fh>; close $fh; $data = $cipher->decrypt($data); $data = uncompress($data); return $data; } __DATA__ Before compression 27000 bytes After compression 116 bytes After encryption 136 bytes RandomIV#vTP;\\-Ij{Et@t--sC7SWy{<6P~)}y9p6_r$0bB(d1$k8:6, Decrypting and decompressing Got back 27000 bytes abcbefghijklmnopqrstuvwxyz abcbefghijklmnopqrstuvwxyz abcbefghij [snip]



      By definition a decent encryption algorithm will turn '0'x1000000 into random (and thus totally uncompressable noise).
      This is a simple measure of the quality of the encryption - the compression algorithm can no longer see any patterns (and we know the plaintext has patterns).

      I don't think this is true of one time pads. I can imagine a situation where a OTP could produce output that could then be highly compressed. Of course, this is a special case nitpick :).

Re: Compressing and Encrypting files on Windows
by elwarren (Curate) on Nov 01, 2004 at 17:21 UTC
    I suggest you look at using some of the Pure Perl crypt modules. This way you will not have to fork or pipe to an external program. I have successfully used the pure perl Crypt::OpenPGP with POE on windows. If you use this module, it will handle the compression of your plaintext before encryption as well.

    If you like rolling your own, like we all do, here are some quick links to cpan. You may have a little work to do finding these when it comes time to install on windows, due to ActiveState's selective ppm distributions and encryption export regulations, but if you're this far along then you already know that :-)
    • Crypt::CBC The encryption and decryption process is about a tenth the speed of the equivalent SSLeay programs (compiled C). This could be improved by implementing this module in C. It may also be worthwhile to optimize the DES and IDEA block algorithms further.
    • Crypt::Rijndael_PP (AES)
    • Crypt::Blowfish_PP

      Thanks for the input.

      I am afraid you use of the phrase "Pure Perl" is confusing me. Normally I would take this to mean a module that is implemented purely in Perl (without any calls to C etc), but that is not the way you are using it (since at least the CPAN implementations of Blowfish and Rijndael I am using have binary elements).

      I guess when you say "Pure Perl" you are focusing on the fact that these can be obtained from CPAN (and / or ActiveState) rather than the other challange of implementing modules on OSs that provide an inadequate set of tools for developers (where if perl is available but there is no good freely available C compiler a Pure Perl module will always work).

        Please don't take this as sarcasm, but the very first line in the POD for the modules says:

        Blowfish encryption algorithm implemented purely in Perl.

        There are no calls to C and these can be used in place of the existing modules.

        My suggestion had nothing to do with OS of choice. The PP modules will allow you to run the encryption code inline with your existing Perl and no forking or external process calls will be needed. Depending on how many calls you're making, this may be faster since you can avoid the context switch and overhead of starting an extproc.

        This is why I referenced my POE example. POE doesn't like external programs because of it's own internal event loop signal handling. Using the PP modules allows me to avoid this.

        Glad you got your code working. I'd be interested in what kind of performance you get, as I've been thinking about implementing something similar.
Re: Compressing and Encrypting files on Windows
by Anonymous Monk on Nov 01, 2004 at 09:54 UTC
    Upgrade to 5.8.5, and read "perlfork" carefully. It has a section on how to do pipe-forks.
My Solution
by hawtin (Prior) on Nov 02, 2004 at 08:36 UTC

    The way I solved it

    Thanks to all those of you who replied to my post. I thought it would be worthwhile to hint at how I solved the issue in case someone has a similar problem later. I mentioned in the post I am using various tools (like XML::SAX) that require file handles of some kind, so any solution has to involve producing IO::Handle like objects.

    When reading data this is no problem, I can just read all my file into a scalar, use my encryption algorithm to decrypt then inflate and finally use the reulting vector to create an IO::Handle via IO::Scalar.

    When writing data the issue is more complex, I have to pass something that the writing modules can treat as a file handle, but when they are done make sure that the post processing happens. I could have changed my code to add an extra "now save this file" step after every place where files are written, but I decided to create my own IO::Thing to do that work for me.

    My first thought was to have an object that held a handle and some extra info, pass all calls through to the handle except close() and when the object is closed do the extra work. That doesn't work because handles are not objects (and life is too short to understand as much about them as I want to).

    My second thought was to create a IO::Scalar subclass that can perform a special action when the close it called, that doesn't work either because IO::Scalar implements its objects as a localised GLOB.

    My third thought was to badly mangle the definition of IO::Scalar to force the behaviour I wanted. This I did creating my own class called IO::ActionOnClose (the implementation is so horrible I won't troble you with it).

    With this class my open function now looks like:

    sub _open_file { my($file_name,$mode) = @_; $mode = "r" if(!defined($mode)); my $compress = 1; $compress = "" if($file_name =~ /\.xml$/i); if($mode eq "r") { my $fh = new IO::File($file_name); binmode $fh; my $f1 = ""; while(!$fh->eof()) { my $c = $fh->read($f1,1024*16,length($f1)); } $fh = undef; my $f2 = ""; if(defined($model_passphrase) && $model_passphrase ne "") { # Pad $f1 to the next 8 byte boundary if((length($f1) % 8) != 0) { $f1 .= "\x00" x (8 - (length($f1) % 8)); } my $cipher = new Crypt::Blowfish $model_passphrase; for(my $i=0;(8*$i)<length($f1);$i++) { $f2 .= $cipher->decrypt(substr($f1,8*$i,8)); } $f2 =~ s/\x00+$//s; } else { $f2 = $f1; } # Just in case the file is big save some memory $f1 = ""; my $f3 = ""; if($compress) { $f3 = uncompress($f2); } else { $f3 = $f2; } return new IO::Scalar \$f3; } else { # When writing I want to first get the output, then compress # then encrypt, then write to the file. # So I have to create a handle that does some magic # when it is closed my $buffer; return new IO::ActionOnClose(\$buffer, action => \&_send_file, args => [\$buffer,$file_name,$compress,$model +_passphrase]); } } } sub _send_file { my($f3_ref,$file_name,$compress,$model_passphrase) = @_; # This is the reverse of what _open_file does for # read (even the variable names are the same) my $f2; if($compress) { $f2 = compress(${$f3_ref}); } else { $f2 = ${$f3_ref}; } my $f1 = ""; if(defined($model_passphrase) && $model_passphrase ne "") { my $cipher = new Crypt::Blowfish $model_passphrase; # Pad $f1 to the next 8 byte boundary if((length($f2) % 8) != 0) { $f2 .= "\x00" x (8 - (length($f2) % 8)); } for(my $i=0;8*$i<length($f2);$i++) { $f1 .= $cipher->encrypt(substr($f2,8*$i,8)); } } else { $f1 = $f2; } my $fh = new IO::File($file_name,"w"); binmode $fh; $fh->syswrite($f1,length($f1)); $fh->close(); }

    Once again thaks to those who contributed and I hope this helps someone else

Re: Compressing and Encrypting files on Windows
by chanio (Priest) on Nov 02, 2004 at 00:32 UTC
    It looks as if you were encripting (in both ways) while reading every line. If that is the case, you should first read all the XML file and then pass it to the other translations.

    That should be the way it works from CLI. Not all at the same time.

    _`(___)' __________________________
    Wherever I lay my KNOPPIX disk, a new FREE LINUX nation could be established.
Re: Compressing and Encrypting files on Windows
by inman (Curate) on Nov 01, 2004 at 14:09 UTC
    It is more conventional to encrypt first and then compress. This is analagous to tar-ing a directory before zipping the resulting tar.

    Update - This is totally wrong. Please ignore.

      Tachyon is precisely correct.

      When you encrypt something, the output is highly entropic and there will be no redundancy in the text. Compression works by removing redundancy from the file, so, given that there is no redundancy in an encrypted file there will be no compression.

      See... this, for example. Pay particular reference to the section which states "Compression after encryption is silly. If an encryption algorithm is good, it will produce output which is statistically indistinguishable from random numbers and no compression algorithm will considerably compress random numbers. On the other hand, if a compression algorithm succeeds in finding a pattern to compress out of an encryption's output, then a flaw in that algorithm has been found. In the majority of encryption utilities (e.g., PGP ) the data is first compressed before it is actually encrypted."

      It is better either to be silent, or to say things of more value than silence. Sooner throw a pearl at hazard than an idle or useless word; and do not say a little in many words, but a great deal in a few.

      Pythagoras (582 BC - 507 BC)

      Sorry that is absolute rubbish. First you are completely wrong in than anyone with two neurons to scratch together or even rudimentary investigative skill can show your first premise is complete rubbish. You second premise has nothing to do with the price of fish, or the question at hand for that matter. There is no analogy between tar and either encryption or compression. All tar does is glue files together with just enough header data to split them back into a dir structure and check for corruption.

      I have presented some sample code above. Please experiment with it and follow your own suggestion. Please learn from it as you clearly have NFI.

      As noted you will get >>>>>ZERO COMPRESSION<<<<< if you encrypt first with any decent algorithm. By design an encryption algorithm will turn an infinite stream of zeros into an equally infinite stream of (pseudo)random noise. This can not, by definition, be compressed. This is not a bug. This is by design :-)

      If you can compress your encrypted files I suggest you have a problem with your encryption algorithm.

        Geez, tachyon, why don't you tell us what you really feel?

        It's one thing to say, "sorry, I think you have that backwards, read my post to understand why;" and a completely different thing to use a caustic, abusive tone in every sentence. We've been without Abigail-II for months now, and I like the change in average tone in the community. Let's not blow it.

        [ e d @ h a l l e y . c c ]

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://404244]
Approved by Corion
Front-paged by Courage
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2018-06-19 01:31 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.