I have done some rather benchmarking of
"line at a time"
vs.
"chunk at a time with manual split into lines"
vs.
"line at a time w/ lots of buffering".
"block at a time with manual split into lines"
is clearly the fastest by almost 2 to 1 over the
other 2 methods. I've included my benchmarking
program and results below:
Benchmark: running BufferedFileHandle, chunk, linebyline, each for at
+least 3 CP
U seconds...
BufferedFileHandle: 3 wallclock secs ( 3.22 usr + 0.08 sys = 3.30 C
+PU) @ <b>2.73/s</b> (n=9)
chunk: 4 wallclock secs ( 2.89 usr + 0.32 sys = 3.21 CPU) @ <
+b>4.36/s</b> (n=14)
linebyline: 4 wallclock secs ( 3.25 usr + 0.06 sys = 3.31 CPU) @ <
+b>2.72/s</b> (n=9)
#!/usr/bin/perl
use Benchmark;
use strict;
use FileHandle;
timethese(0, { 'linebyline' => \&linebyline,
'chunk' => \&chunk ,
'BufferedFileHandle' => \&BufferedFileHandle });
sub linebyline {
open(FILE, "file");
while(<FILE>) { }
close(FILE);
}
sub chunk {
my($buf, $leftover, @lines);
open(FILE, "file");
while(read FILE, $buf, 64*1024) {
$buf = $leftover.$buf;
@lines = split(/\n/, $buf);
$leftover = ($buf !~ /\n$/) ? pop @lines : "";
foreach (@lines) { }
}
close(FILE);
}
sub BufferedFileHandle{
my $fh=new FileHandle;
my $buffer_var;
$fh->open("file");
$fh->setvbuf($buffer_var, _IOLBF, 64*1024);
while(<$fh>) { }
close(FILE);
}
I'd be very interested to see your results that show diffrently.
Edit to replace CODE tags for PRE tags around long lines