Check out the code below (inspired by a comment from
dws
in a recent chatterbox discussion)
use Benchmark;
sub sub1 { my $fh;
open $fh, 'vsfull.csv';
binmode $fh;
my @lines;
my $block; my $left = '';
while( read $fh, $block, 8192 ){
$block = $left . $block;
my $i = index $block, "\n";
while($i > 0){
push @lines, substr($block,0,$i);
substr($block,0,$i+1,'');
$i = index $block, "\n";
}
$left = $block;
}
}
sub sub2 { my $fh;
my @lines;
open $fh, 'vsfull.csv';
while(<$fh>){ push @lines, $_ };
}
timethese( 100, { readbig => \&sub1, whilelp => \&sub2 });
The results?
Benchmark: timing 100 iterations of readbig, whilelp...
readbig: 25 wallclock secs (25.21 usr + 0.00 sys = 25.21 CPU) @ 3
+.97/s (n=100)
whilelp: 157 wallclock secs (156.71 usr + 0.00 sys = 156.71 CPU) @
+ 0.64/s (n=100)
Now admittedly, my code is probably clunky and whatnot, but I would assume
that this would be the model one would follow for splitting a
file into multiple lines. My question has two parts: why would one use <> when
it's so slow relative to
read, and why hasn't <> been implemented
in such a fashion that it takes advantage of
read's quickness?
Cluka
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.