Why is "for" much slower than "while"?

Replies are listed 'Best First'.
Re: Why is "for" much slower than "while"? by FunkyMonk (Chancellor) on Jan 15, 2010 at 13:12 UTC
for reads in the entire file, building a list of lines and then iterates over that list. while reads the file one line at a time. Update: Links fixed. Thanks toolic.	[reply]
Re^2: Why is "for" much slower than "while"? by gam3 (Curate) on Jan 20, 2010 at 13:59 UTC
I don't think this is the complete answer as `read $IN1, my $buffer, -s $IN1; ++$counts{$_} for split( /^/m, $buffer );` [download] is faster than the `for`. /dev/null Rate for read while for 362612/s -- -7% -19% read 389496/s 7% -- -13% while 449755/s 24% 15% -- /usr/share/dict/words Rate for read while for 4.73/s -- -17% -37% read 5.73/s 21% -- -24% while 7.55/s 60% 32% -- /etc/passwd Rate for while read for 14434/s -- -17% -21% while 17297/s 20% -- -6% read 18355/s 27% 6% -- #!/usr/bin/perl use strict; use Benchmark qw( cmpthese ); foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) { open my $IN1, '<', $file or die "could not open $file"; my @list = <$IN1>; seek( $IN1, 0, 0 ); print "$file\n"; cmpthese( -5, { for => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for <$IN1>; die unless keys %counts == @list; }, while => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} while <$IN1>; die unless keys %counts == @list; }, read => sub { seek( $IN1, 0, 0 ); my %counts = (); read $IN1, my $buffer, -s $IN1; ++$counts{$_} for split( /^/m, $buffer ); die unless keys %counts == @list; }, } ); } [download] -- gam3 A picture is worth a thousand words, but takes 200K.	[reply] [d/l] [select]
Re: Why is "for" much slower than "while"? by Fletch (Bishop) on Jan 15, 2010 at 13:13 UTC
The first version is reading in all of the lines beforehand building up a temporary list (which given a substantially large file may take a good bit of time and/or memory) then iterating over that list; the while reads one line at a time until the end of file is reached so there's nowhere near as much overhead. The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re: Why is "for" much slower than "while"? by Anonymous Monk on Jan 15, 2010 at 13:20 UTC
context, while is scalar, for is list, and readline reads the whole file in list context `$ perl -MO=Deparse -e" ++$counts{$_} for <IN>; " ; ++$counts{$_} foreach (<IN>); -e syntax OK $ perl -MO=Deparse -e" ++$counts{$_} while <IN>; " ++$counts{$_} while defined($_ = <IN>); -e syntax OK` [download]	[reply] [d/l]
Re: Why is "for" much slower than "while"? by steve (Deacon) on Jan 15, 2010 at 16:24 UTC
In addition to what others have said, some additional (external) factors may also contribute to the results you see, for example: Disk I/O: different filesystems read in different ways and scheduled I/O can be affected by that as well as block size (see tuning information for "elevator" algorithm on ext3). CPU allocation: other processes can be competing for the same CPU. Available RAM: loading an entire file into RAM is much quicker than loading a segment of it into virtual memory or a paging file.	[reply]
Re: Why is "for" much slower than "while"? by gam3 (Curate) on Jan 17, 2010 at 01:34 UTC
I ran some benchmarks and I the real answer is that `while <..>` has been finely tuned. On a large file it is almost as fast as `foreach` over a list. /dev/null Rate for whyfor while while_list for_list for 374878/s -- -13% -15% -55% -62% whyfor 433016/s 16% -- -2% -48% -56% while 442469/s 18% 2% -- -47% -55% while_list 833025/s 122% 92% 88% -- -16% for_list 991880/s 165% 129% 124% 19% -- /usr/share/dict/words Rate for while_list whyfor while for_list for 4.80/s -- -35% -37% -38% -43% while_list 7.40/s 54% -- -2% -5% -13% whyfor 7.57/s 58% 2% -- -2% -11% while 7.75/s 62% 5% 2% -- -9% for_list 8.48/s 77% 15% 12% 9% -- /etc/passwd Rate for whyfor while while_list for_list for 14440/s -- -13% -16% -24% -49% whyfor 16599/s 15% -- -4% -12% -42% while 17224/s 19% 4% -- -9% -39% while_list 18915/s 31% 14% 10% -- -33% for_list 28442/s 97% 71% 65% 50% -- #!/usr/bin/perl use strict; use Benchmark qw( cmpthese ); foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) { open my $IN1, '<', $file or die "could not open $file"; my @list = <$IN1>; seek( $IN1, 0, 0 ); print "$file\n"; cmpthese( -5, { for_list => sub { my %counts = (); ++$counts{$_} for @list; die unless keys %counts == @list; }, while_list => sub { my $x = 0; my %counts = (); ++$counts{$_} while defined( $_ = $list[ $x++ ] ); die unless keys %counts == @list; }, for => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for <$IN1>; die unless keys %counts == @list; }, while => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} while <$IN1>; die unless keys %counts == @list; }, whyfor => sub { seek( $IN1, 0, 0 ); my %counts = (); for ( ; defined( $_ = <$IN1> ) ; ) { ++$counts{$_}; } die unless keys %counts == @list; }, } ); } [download] -- gam3 A picture is worth a thousand words, but takes 200K.	[reply] [d/l]
Re^2: Why is "for" much slower than "while"? by Anonymous Monk on Jan 17, 2010 at 01:52 UTC
Try `whyfor2 => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for ( ; defined( $_ = <$IN1> ) ; ) ; die unless keys %counts == @list; },` [download]	[reply] [d/l]
Re^2: Why is "for" much slower than "while"? by Xiong (Hermit) on Jan 24, 2010 at 00:08 UTC
My results from gam3's script, second version: /dev/null Rate for whyfor while while_list for_list for 280029/s -- -16% -17% -73% -74% whyfor 333697/s 19% -- -1% -68% -69% while 338448/s 21% 1% -- -68% -69% while_list 1050567/s 275% 215% 210% -- -3% for_list 1079903/s 286% 224% 219% 3% -- /usr/share/dict/words Rate for whyfor while_list while for_list for 3.67/s -- -39% -41% -42% -51% whyfor 5.99/s 63% -- -4% -5% -21% while_list 6.26/s 71% 5% -- -0% -17% while 6.27/s 71% 5% 0% -- -17% for_list 7.56/s 106% 26% 21% 21% -- /etc/passwd Rate whyfor for while_list while for_list whyfor 13584/s -- -13% -17% -30% -56% for 15664/s 15% -- -5% -19% -49% while_list 16411/s 21% 5% -- -15% -47% while 19273/s 42% 23% 17% -- -38% for_list 30882/s 127% 97% 88% 60% --	[reply]
Re: Why is "for" much slower than "while"? by Xiong (Hermit) on Jan 16, 2010 at 11:23 UTC
Sorry; I'm not sure I see a clear explanation yet. Yes, I understand that `for` (in the given example) slurps the whole file and `while` reads line-by-line; that's clear. But why should `while` necessarily be faster? I smell the possibility of certain files or certain hardware running faster with `for`. That may not be the actual case; I'm only saying that, a priori, there's nothing in slurp vs readline to convince me `while` must be faster. I can easily picture a situation in which individual file accesses were slower, due to contention for the disk, a stingy cache, or something else. Looks as though steve is pointing that way. Rather than start an argument, though, I'd just like to say that it would be very nice to see some actual benchmarking with various inputs: large files with short lines, large files with long lines, short files, odd shaped files. I don't have the software test experience to write the script but if Someone capable were to offer it, I'd run it and post the results. If we had a few different Monks do this on different platforms, particularly while under various other loads, we might have an objective basis for claims.	[reply] [d/l] [select]

Update: Links fixed. Thanks toolic.

read $IN1, my $buffer, -s $IN1;
++$counts{$_} for split( /^/m, $buffer );
[download]

for

/dev/null
          Rate   for  read while
for   362612/s    --   -7%  -19%
read  389496/s    7%    --  -13%
while 449755/s   24%   15%    --

/usr/share/dict/words
        Rate   for  read while
for   4.73/s    --  -17%  -37%
read  5.73/s   21%    --  -24%
while 7.55/s   60%   32%    --

/etc/passwd
         Rate   for while  read
for   14434/s    --  -17%  -21%
while 17297/s   20%    --   -6%
read  18355/s   27%    6%    --

#!/usr/bin/perl
use strict;
use Benchmark qw( cmpthese );

foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) {
    open my $IN1, '<', $file or die "could not open $file";
    my @list = <$IN1>;
    seek( $IN1, 0, 0 );
    print "$file\n";
    cmpthese(
        -5,
        {
            for => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                ++$counts{$_} for <$IN1>;
                die unless keys %counts == @list;
            },
            while => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                ++$counts{$_} while <$IN1>;
                die unless keys %counts == @list;
            },
            read => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
        read $IN1, my $buffer, -s $IN1;
                ++$counts{$_} for split( /^/m, $buffer );
                die unless keys %counts == @list;
            },
        }
    );
}
[download]

-- gam3
A picture is worth a thousand words, but takes 200K.

[reply]
[d/l]
[select]

The first version is reading in all of the lines beforehand building up a temporary list (which given a substantially large file may take a good bit of time and/or memory) then iterating over that list; the while reads one line at a time until the end of file is reached so there's nowhere near as much overhead.

The cake is a lie.
The cake is a lie.
The cake is a lie.

[reply]

readline

$ perl -MO=Deparse -e" ++$counts{$_} for <IN>; "
;
++$counts{$_} foreach (<IN>);
-e syntax OK

$ perl -MO=Deparse -e" ++$counts{$_} while <IN>; "
++$counts{$_} while defined($_ = <IN>);
-e syntax OK
[download]

[reply]
[d/l]

Disk I/O: different filesystems read in different ways and scheduled I/O can be affected by that as well as block size (see tuning information for "elevator" algorithm on ext3).
CPU allocation: other processes can be competing for the same CPU.
Available RAM: loading an entire file into RAM is much quicker than loading a segment of it into virtual memory or a paging file.

[reply]

while <..>

foreach

/dev/null
               Rate        for     whyfor      while while_list   for_list
for        374878/s         --       -13%       -15%       -55%       -62%
whyfor     433016/s        16%         --        -2%       -48%       -56%
while      442469/s        18%         2%         --       -47%       -55%
while_list 833025/s       122%        92%        88%         --       -16%
for_list   991880/s       165%       129%       124%        19%         --

/usr/share/dict/words
             Rate        for while_list     whyfor      while   for_list
for        4.80/s         --       -35%       -37%       -38%       -43%
while_list 7.40/s        54%         --        -2%        -5%       -13%
whyfor     7.57/s        58%         2%         --        -2%       -11%
while      7.75/s        62%         5%         2%         --        -9%
for_list   8.48/s        77%        15%        12%         9%         --

/etc/passwd
              Rate        for     whyfor      while while_list   for_list
for        14440/s         --       -13%       -16%       -24%       -49%
whyfor     16599/s        15%         --        -4%       -12%       -42%
while      17224/s        19%         4%         --        -9%       -39%
while_list 18915/s        31%        14%        10%         --       -33%
for_list   28442/s        97%        71%        65%        50%         --

#!/usr/bin/perl
use strict;
use Benchmark qw( cmpthese );

foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) {
    open my $IN1, '<', $file or die "could not open $file";
    my @list = <$IN1>;
    seek( $IN1, 0, 0 );
    print "$file\n";
    cmpthese(
        -5,
        {
            for_list => sub {
                my %counts = ();
                ++$counts{$_} for @list;
                die unless keys %counts == @list;
            },
            while_list => sub {
                my $x      = 0;
                my %counts = ();
                ++$counts{$_} while defined( $_ = $list[ $x++ ] );
                die unless keys %counts == @list;
            },
            for => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                ++$counts{$_} for <$IN1>;
                die unless keys %counts == @list;
            },
            while => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                ++$counts{$_} while <$IN1>;
                die unless keys %counts == @list;
            },
            whyfor => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                for ( ; defined( $_ = <$IN1> ) ; ) { ++$counts{$_}; }
                die unless keys %counts == @list;
            },
        }
    );
}
[download]

-- gam3
A picture is worth a thousand words, but takes 200K.

[reply]
[d/l]


            whyfor2 => sub {
                seek( $IN1, 0, 0 );
                my %counts = ();
                 ++$counts{$_}  for ( ; defined( $_ = <$IN1> ) ; ) ;
                die unless keys %counts == @list;
            },
[download]

[reply]
[d/l]

My results from gam3's script, second version:

/dev/null
                Rate        for     whyfor      while while_list   for_list
for         280029/s         --       -16%       -17%       -73%       -74%
whyfor      333697/s        19%         --        -1%       -68%       -69%
while       338448/s        21%         1%         --       -68%       -69%
while_list 1050567/s       275%       215%       210%         --        -3%
for_list   1079903/s       286%       224%       219%         3%         --
/usr/share/dict/words
             Rate        for     whyfor while_list      while   for_list
for        3.67/s         --       -39%       -41%       -42%       -51%
whyfor     5.99/s        63%         --        -4%        -5%       -21%
while_list 6.26/s        71%         5%         --        -0%       -17%
while      6.27/s        71%         5%         0%         --       -17%
for_list   7.56/s       106%        26%        21%        21%         --
/etc/passwd
              Rate     whyfor        for while_list      while   for_list
whyfor     13584/s         --       -13%       -17%       -30%       -56%
for        15664/s        15%         --        -5%       -19%       -49%
while_list 16411/s        21%         5%         --       -15%       -47%
while      19273/s        42%        23%        17%         --       -38%
for_list   30882/s       127%        97%        88%        60%         --

[reply]

Sorry; I'm not sure I see a clear explanation yet. Yes, I understand that for (in the given example) slurps the whole file and while reads line-by-line; that's clear. But why should while necessarily be faster?

I smell the possibility of certain files or certain hardware running faster with for. That may not be the actual case; I'm only saying that, a priori, there's nothing in slurp vs readline to convince me while must be faster. I can easily picture a situation in which individual file accesses were slower, due to contention for the disk, a stingy cache, or something else. Looks as though steve is pointing that way.

Rather than start an argument, though, I'd just like to say that it would be very nice to see some actual benchmarking with various inputs: large files with short lines, large files with long lines, short files, odd shaped files. I don't have the software test experience to write the script but if Someone capable were to offer it, I'd run it and post the results. If we had a few different Monks do this on different platforms, particularly while under various other loads, we might have an objective basis for claims.

[reply]
[d/l]
[select]