Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
Welcome to the Monastery
 
PerlMonks  

Fast parsing for cypher block chaining

by fluffyvoidwarrior (Monk)
on Feb 28, 2006 at 17:58 UTC ( #533433=perlquestion: print w/ replies, xml ) Need Help??
fluffyvoidwarrior has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm having a spot of bother with Crypt::CBC in that it's slow and it bombs when faced with 700Mb of data to chain. I thought I'd have a bash at writing my own CBC thing. Currently I'm benchmarking parsing my data in 8 byte lumps for Blowfish and I can't improve on this code using substr().
$bufsize = length $buffer; for($chunk_counter = 0; $chunk_counter < $bufsize; $chu +nk_counter = $chunk_counter + 8){ $eight_byte_chunk = substr($buffer, $chunk_counter ,8) +; $out_buffer = $out_buffer . $eight_byte_chunk; } syswrite (OUTFILE,$out_buffer, $buffersize);
I'm sysreading the file into 64k buffers and then parsing the buffer in 8 byte lumps. I intend then to chain and blowfish it before syswriting the buffer back out to the encrypted file. I've tried various approaches and hoped unpack() would cut it but it's actually about 20% slower than the above. In the olden days I would have done this using a sliding pointer starting at the beginning of the buffer and incrementing it 8 bytes per read with an 8 byte data structure as the fastest method but I don't think you can do this in perl .. ? I'm sure there must be a more sophisticated method than substr.
Any "why not try this" suggestions are as always greatly appreciated.
Thanks

Comment on Fast parsing for cypher block chaining
Download Code
Re: Fast parsing for cypher block chaining
by Roy Johnson (Monsignor) on Feb 28, 2006 at 18:09 UTC
    I don't understand why you're putting 8 bytes at a time on the buffer. Is it just to ensure that you have an integer multiple of 8 bytes in your out_buffer? If so,
    $bufsize = length $buffer; # Make sure it's an integer multiple of 8 by reducing by any remainder $bufsize -= $bufsize % 8; #you could copy substr($buffer, 0, $bufsize) if you wanted/needed to syswrite (OUTFILE, $buffer, $bufsize);

    Caution: Contents may have been coded under pressure.
      "I intend then to chain and blowfish it before syswriting the buffer back out to the encrypted file.", the OP said. Blowfish only works on 8 bytes of data at a time, so the OP is looking for an efficient method of dividing a string into 8 byte segments.
      cos Crypt::Blowfish works with 8 byte blocks. The disk reads and writes I'm doing in 64k chunks which seems about optimum. I'm then splitting the 64k buffer into 8 byte lumps for Blowfish.
        I suspect you will do no worse letting Perl handle the buffering for you:
        $/ = \8; # Read 8-byte records. Behind the scenes, Perl will do buffer +ing of reads. $out_buffer .= blowfishify($_) while <INFILE>; print OUTFILE $out_buffer;
        Alternatively, you could (with a sufficiently modern Perl) read from your buffer as if it were a filehandle, using the same $/ = \8 trick.
        open BUF, '<', \$buffer or die "$!: Could not open buffer\n"; $/ = \8; $out_buffer .= blowfishify($_) while <INFILE>; print OUTFILE $out_buffer;
        They're tidier ways of doing what you want, but you'd have to try them to see whether they buy or lose you any efficiency. My guess is that the I/O isn't going to be the bottleneck, anyway; the encryption is.

        Caution: Contents may have been coded under pressure.
Re: Fast parsing for cypher block chaining
by ikegami (Pope) on Feb 28, 2006 at 18:21 UTC
    Why not let Perl do the buffering for you. read and write are buffered (unlike sysread and syswrite).
    while (my $bufsize = read(INFILE, my $eight_byte_chunk, 8)) { ...[ Add padding ]... ...[ Encrypt ]... ...[ Merge with IV ]... ...[ Compute new IV ]... write(OUTFILE, $eight_byte_chunk, 8); }
      Because I thought read would be slower than sysread and the 8 byte buffer would result in a lot of disk thrashing compared to 64k buffer
        the 8 byte buffer would result in a lot of disk thrashing compared to 64k buffer

        And there shouldn't be any disk trashing because read and write are buffered. The size of the buffer isn't 8 bytes as you claim, but rather a multiple of the size of a disc sector. The disk is not accessed every time read and write is called.

        Because I thought read would be slower than sysread

        sysread + syswrite + manual buffering in Perl
        should be slower than
        read + write + well tested buffering in C

        Note that I didn't say my way is faster, just that it might be. Benchmark to find out which is faster on your system.

Re: Fast parsing for cypher block chaining
by ikegami (Pope) on Feb 28, 2006 at 18:55 UTC

    I thought there might be a way of reusing the existing code in Crypt::CBC, so I took a peek. Not only can code from Crypt::CBC be reused, Crypt::CBC already does what you want! The following encrypts:

    use Crypt::CBC; my $cipher = Crypt::CBC->new(...); open(INFILE, '<', 'BIG_FILE.txt') or die("Unable to open input file: $!"); open(OUTFILE, '>', 'BIG_FILE.enc') or die("Unable to create output file: $!"); $cipher->start('encrypting'); while (read(INFILE, $buffer, 65536)) { print OUTFILE $cipher->crypt($buffer); } print OUTFILE $cipher->finish;

    The following decrypts:

    use Crypt::CBC; my $cipher = Crypt::CBC->new(...); open(INFILE, '<', 'BIG_FILE.enc') or die("Unable to open input file: $!"); open(OUTFILE, '>', 'BIG_FILE.txt') or die("Unable to create output file: $!"); $cipher->start('decrypting'); while (read(INFILE, $buffer, 65536)) { print OUTFILE $cipher->crypt($buffer); } print OUTFILE $cipher->finish;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://533433]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-04-19 03:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (477 votes), past polls