perlquestion
gam3
I was looking into [id://817632] and it occured to me that the <tt>foreach</tt> is doing something very close to <tt>@bob = <IN></tt>. This led me to wonder if the <tt>while</tt> was fast enough to make up for the the expected time lost in allocating memory for the list while pushing elements onto it.
<p>
First let me show some benchmarks that demonstrate that it really is faster to
use the <tt>while</tt> statment in place of a simple assignment. However using <tt>[ <IN> ] </tt> seems to be as efficient as the <tt>while</tt> loop.
<pre>
/dev/null
Rate while array read list
while 566972/s -- -5% -13% -16%
array 597140/s 5% -- -8% -11%
read 650963/s 15% 9% -- -3%
list 672405/s 19% 13% 3% --
/usr/share/dict/words
Rate list read array while
list 6.76/s -- -32% -32% -33%
read 9.88/s 46% -- -0% -2%
array 9.92/s 47% 0% -- -1%
while 10.1/s 49% 2% 1% --
/etc/passwd
Rate list while array read
list 21806/s -- -2% -19% -31%
while 22357/s 3% -- -17% -29%
array 26906/s 23% 20% -- -15%
read 31658/s 45% 42% 18% --
/opt/temp
s/iter list read array while
list 1.79 -- -25% -29% -38%
read 1.34 34% -- -4% -17%
array 1.28 40% 4% -- -13%
while 1.12 61% 20% 15% --
</pre>
Now that <tt>read</tt> is not cheating, the results seem more reasonable.
<p>
It just does not make sense to me that:
<tt>push(@bob, $_) while <$IN>;</tt>
would be faster than
<tt>@bob = <$IN></tt>.
<code>
#!/usr/bin/perl
use strict;
use Benchmark qw( cmpthese );
use Data::Dumper;
foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd /opt/temp) {
open my $IN1, '<', $file or die "could not open $file";
my @list = <$IN1>;
seek( $IN1, 0, 0 );
my $dl = length( Dumper \@list );
print "$file\n";
cmpthese(
-10,
{
while => sub {
seek( $IN1, 0, 0 );
my @bob = ();
while (<$IN1>) {
push @bob, $_;
}
# my $x = Dumper \@bob;
# die unless length($x) == $dl;
# die @bob . ' ' . @list unless @bob == @list;
},
list => sub {
seek( $IN1, 0, 0 );
my @bob = <$IN1>;
# my $x = Dumper \@bob;
# die unless length($x) == $dl;
# die @bob . ' ' . @list unless @bob == @list;
},
array => sub {
seek( $IN1, 0, 0 );
my $bob = [<$IN1>];
# my $x = Dumper $bob;
# die unless length($x) == $dl;
# die @$bob . ' ' . @list unless @$bob == @list;
},
read => sub {
seek( $IN1, 0, 0 );
read $IN1, my $bob, -s $IN1;
my @bob = split( /^/m, $bob );
# my $x = Dumper \@bob;
# die @bob . ' ' . @list unless @bob == @list;
# die length($x), ' ', $dl unless length($x) == $dl;
},
}
);
}
</code>
<b>UPDATE</b>: I left the Dumper line out of the <b>read</b> test -- thanks [chromatic]. This explains why <b>read</b> was so fast. It is slower now.
<p>I have removed <tt>Data::Dumper</tt> from the benechmark as well, as it did not change the results and slowed everything down.
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-424604">
-- gam3<br/>
<small>A picture is worth a thousand words, but takes 200K.</small>
</div></div>