Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^4: Fast parsing for cypher block chaining

by fluffyvoidwarrior (Monk)
on Mar 01, 2006 at 07:30 UTC ( #533590=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Fast parsing for cypher block chaining
in thread Fast parsing for cypher block chaining

Just benchmarked a comparison between sysread/write with substr() and straight perl IO read/write if anyone is still interested.
The results are a bit of a shock!
Parsing a 700Mb file in 8 byte chunks took 81 seconds with the sysread method. Using perl buffered read/write it took 522 seconds. It seems sysread and handling your own buffering can produce performance gains of upto 700% - which is what I'm looking for.
Heres my code just in case I've done anything dumb (I'm assuming using OOP IO is OK)

$infile = new IO::File; $outfile = new IO::File; $infile->open($input_filepath); $outfile->open(">$output_filepath"); for($chunk_counter = 0; $chunk_counter < $infile_num_chunks +1; + $chunk_counter = $chunk_counter + 1){ $infile->read($buffer, 8); $outfile->write($buffer,8); }

I originally used a while construct for loop control but then thought maybe it was slowing things down. It was. Using "while" took 522 secs. Using the counter as above took 449 secs. Either way sysread and substr() is loads faster.


Comment on Re^4: Fast parsing for cypher block chaining
Download Code
Re^5: Fast parsing for cypher block chaining
by Anonymous Monk on Mar 01, 2006 at 09:05 UTC
    That is not benchmark code
      Its a snippet of a subroutine timed using the Benchmark module ie
      use Benchmark;
      my $interval = timeit(1, \&wibble);
      where wibble is my subroutine. I'm also MD5 hashing the input and output files before and after my subroutine is called to make sure I'm not outputting garbage and also checking filesize.
      Anyway, just by watching a clock I can tell the difference between nearly 10 mins execution and 1.5 mins.
      Can anyone rewrite the above code to increase performance by a factor of 5 or more ?
      Obviously if the two approches were in the same ballpark I wouldn't bother with the sysread stuff - I'm not trying to complicate things on purpose.
Re^5: Fast parsing for cypher block chaining
by ikegami (Pope) on Mar 01, 2006 at 15:18 UTC

    If it done right, it should be faster than this buggy code you show here. I say buggy since you're assuming the input file is a multiple of 8 bytes.

    Using IO::File is probably much slower than using read and write. Objects are slower than their non-object equivalent.

    If you want to use sysread and syswrite, you can use them with the solution in Re: Fast parsing for cypher block chaining, which is what you should be using.

      Yes, Thanks.
      I'll compare this with using Crypt::CBC as you suggested earlier.
      As for the "bug" you spotted - my intention was to parse the last 64k buffer outside this loop which removes the need for huge numbers of loop exit condition tests ("while" actually does slow it down), replacing such tests with a simple counter in all except the last bufferfull.
      I suppose I should have posted more code to give a better feel of things but I didn't want to dump too much on people when I thought I'd isolated the crux of the speed issue. That being how to optimise splitting a 64k string into 8 byte chunks. Using Crypt::CBC as you suggested may mean that I don't need to do this anyway.
      Again, thanks for your help. I'll experiment further and update this post if your still interested .....

        while is 19% faster! Well, for small files. For big files, it's probably the same speed, but more readable.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://533590]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2014-07-30 04:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls