Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Performance Question

by tachyon (Chancellor)
on May 08, 2002 at 16:06 UTC ( #165083=note: print w/ replies, xml ) Need Help??


in reply to Performance Question

You can get data in whatever chunk size you want using read(). Here is an example that takes 24 seconds to process a 100MB file on my PIII with slow disks. That gives a throughput of 4MB per second which will process your 81GB file in under 6 hours. The optimal chunk size empirically is around 1MB with modest benefits increasing it to 2,4 and 8 MB. With smaller chunks you can here the heads flipping from one file area to the other - bigger chunks allow the heads to chill. At 64kB the run time was 57 seconds and the disks screamed. At 4MB the runtime was 23 seconds.

If possible I would suggest reading from one disk and writing to a completely separate one (I did the testing on a single partition of a single disk). You could also roughly double the speed by forking a kid to do the disk write while the parent reads and processes more info. This will only help if you are reading from one disk and writing to another.

#!/usr/bin/perl -w use strict; my $chunk = 2**20; # try 1MB to start but it may be faster to go bigg +er/smaller my $infile = 'c:/test.txt'; my $outfile = 'c:/out.txt'; open IN, $infile or die "Can't open $infile $!\n"; open OUT, ">$outfile" or die "Can't open $outfile $!\n"; my $buffer; my $partial_line = ''; my $start = time; while (read(IN, $buffer, $chunk)) { # we should only process full lines so we trim off the partial lin +e # that we inevitably get at the end of our read and save it into $ +2 $buffer =~ s/^(.*\n)([^\n]+)\z/$1/s; # add last partial line to front of buffer $buffer = $partial_line.$buffer; # save the current partial line for next loop so we can add it bac +k on $partial_line = $2 || ''; # make changes $buffer =~ s/this/that/g; print OUT $buffer; } print "Took ", time - $start, " seconds\n"; close IN; close OUT;

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


Comment on Re: Performance Question
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://165083]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2015-07-29 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls