http://www.perlmonks.org?node_id=817632

di has asked for the wisdom of the Perl Monks concerning the following question:

I'm curious why

++$counts{$_} for <IN>;

takes about eight times as long as

++$counts{$_} while <IN>;

. Any explanations?

Update: Thanks for the replies.

Replies are listed 'Best First'.
Re: Why is "for" much slower than "while"?
by FunkyMonk (Chancellor) on Jan 15, 2010 at 13:12 UTC
    for reads in the entire file, building a list of lines and then iterates over that list. while reads the file one line at a time.

    Update: Links fixed. Thanks toolic.

      I don't think this is the complete answer as
      read $IN1, my $buffer, -s $IN1; ++$counts{$_} for split( /^/m, $buffer );
      is faster than the for.
      /dev/null
                Rate   for  read while
      for   362612/s    --   -7%  -19%
      read  389496/s    7%    --  -13%
      while 449755/s   24%   15%    --
      
      /usr/share/dict/words
              Rate   for  read while
      for   4.73/s    --  -17%  -37%
      read  5.73/s   21%    --  -24%
      while 7.55/s   60%   32%    --
      
      /etc/passwd
               Rate   for while  read
      for   14434/s    --  -17%  -21%
      while 17297/s   20%    --   -6%
      read  18355/s   27%    6%    --
      
      #!/usr/bin/perl use strict; use Benchmark qw( cmpthese ); foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) { open my $IN1, '<', $file or die "could not open $file"; my @list = <$IN1>; seek( $IN1, 0, 0 ); print "$file\n"; cmpthese( -5, { for => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for <$IN1>; die unless keys %counts == @list; }, while => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} while <$IN1>; die unless keys %counts == @list; }, read => sub { seek( $IN1, 0, 0 ); my %counts = (); read $IN1, my $buffer, -s $IN1; ++$counts{$_} for split( /^/m, $buffer ); die unless keys %counts == @list; }, } ); }
      -- gam3
      A picture is worth a thousand words, but takes 200K.
Re: Why is "for" much slower than "while"?
by Fletch (Bishop) on Jan 15, 2010 at 13:13 UTC

    The first version is reading in all of the lines beforehand building up a temporary list (which given a substantially large file may take a good bit of time and/or memory) then iterating over that list; the while reads one line at a time until the end of file is reached so there's nowhere near as much overhead.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Why is "for" much slower than "while"?
by Anonymous Monk on Jan 15, 2010 at 13:20 UTC
    context, while is scalar, for is list, and readline reads the whole file in list context
    $ perl -MO=Deparse -e" ++$counts{$_} for <IN>; " ; ++$counts{$_} foreach (<IN>); -e syntax OK $ perl -MO=Deparse -e" ++$counts{$_} while <IN>; " ++$counts{$_} while defined($_ = <IN>); -e syntax OK
Re: Why is "for" much slower than "while"?
by steve (Deacon) on Jan 15, 2010 at 16:24 UTC
    In addition to what others have said, some additional (external) factors may also contribute to the results you see, for example:
    1. Disk I/O: different filesystems read in different ways and scheduled I/O can be affected by that as well as block size (see tuning information for "elevator" algorithm on ext3).
    2. CPU allocation: other processes can be competing for the same CPU.
    3. Available RAM: loading an entire file into RAM is much quicker than loading a segment of it into virtual memory or a paging file.
Re: Why is "for" much slower than "while"?
by gam3 (Curate) on Jan 17, 2010 at 01:34 UTC
    I ran some benchmarks and I the real answer is that while <..> has been finely tuned. On a large file it is almost as fast as foreach over a list.
    /dev/null
                   Rate        for     whyfor      while while_list   for_list
    for        374878/s         --       -13%       -15%       -55%       -62%
    whyfor     433016/s        16%         --        -2%       -48%       -56%
    while      442469/s        18%         2%         --       -47%       -55%
    while_list 833025/s       122%        92%        88%         --       -16%
    for_list   991880/s       165%       129%       124%        19%         --
    
    /usr/share/dict/words
                 Rate        for while_list     whyfor      while   for_list
    for        4.80/s         --       -35%       -37%       -38%       -43%
    while_list 7.40/s        54%         --        -2%        -5%       -13%
    whyfor     7.57/s        58%         2%         --        -2%       -11%
    while      7.75/s        62%         5%         2%         --        -9%
    for_list   8.48/s        77%        15%        12%         9%         --
    
    /etc/passwd
                  Rate        for     whyfor      while while_list   for_list
    for        14440/s         --       -13%       -16%       -24%       -49%
    whyfor     16599/s        15%         --        -4%       -12%       -42%
    while      17224/s        19%         4%         --        -9%       -39%
    while_list 18915/s        31%        14%        10%         --       -33%
    for_list   28442/s        97%        71%        65%        50%         --
    
    #!/usr/bin/perl use strict; use Benchmark qw( cmpthese ); foreach my $file qw ( /dev/null /usr/share/dict/words /etc/passwd ) { open my $IN1, '<', $file or die "could not open $file"; my @list = <$IN1>; seek( $IN1, 0, 0 ); print "$file\n"; cmpthese( -5, { for_list => sub { my %counts = (); ++$counts{$_} for @list; die unless keys %counts == @list; }, while_list => sub { my $x = 0; my %counts = (); ++$counts{$_} while defined( $_ = $list[ $x++ ] ); die unless keys %counts == @list; }, for => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for <$IN1>; die unless keys %counts == @list; }, while => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} while <$IN1>; die unless keys %counts == @list; }, whyfor => sub { seek( $IN1, 0, 0 ); my %counts = (); for ( ; defined( $_ = <$IN1> ) ; ) { ++$counts{$_}; } die unless keys %counts == @list; }, } ); }
    -- gam3
    A picture is worth a thousand words, but takes 200K.
      Try
      whyfor2 => sub { seek( $IN1, 0, 0 ); my %counts = (); ++$counts{$_} for ( ; defined( $_ = <$IN1> ) ; ) ; die unless keys %counts == @list; },

      My results from gam3's script, second version:

      /dev/null
                      Rate        for     whyfor      while while_list   for_list
      for         280029/s         --       -16%       -17%       -73%       -74%
      whyfor      333697/s        19%         --        -1%       -68%       -69%
      while       338448/s        21%         1%         --       -68%       -69%
      while_list 1050567/s       275%       215%       210%         --        -3%
      for_list   1079903/s       286%       224%       219%         3%         --
      /usr/share/dict/words
                   Rate        for     whyfor while_list      while   for_list
      for        3.67/s         --       -39%       -41%       -42%       -51%
      whyfor     5.99/s        63%         --        -4%        -5%       -21%
      while_list 6.26/s        71%         5%         --        -0%       -17%
      while      6.27/s        71%         5%         0%         --       -17%
      for_list   7.56/s       106%        26%        21%        21%         --
      /etc/passwd
                    Rate     whyfor        for while_list      while   for_list
      whyfor     13584/s         --       -13%       -17%       -30%       -56%
      for        15664/s        15%         --        -5%       -19%       -49%
      while_list 16411/s        21%         5%         --       -15%       -47%
      while      19273/s        42%        23%        17%         --       -38%
      for_list   30882/s       127%        97%        88%        60%         --
      
Re: Why is "for" much slower than "while"?
by Xiong (Hermit) on Jan 16, 2010 at 11:23 UTC

    Sorry; I'm not sure I see a clear explanation yet. Yes, I understand that for (in the given example) slurps the whole file and while reads line-by-line; that's clear. But why should while necessarily be faster?

    I smell the possibility of certain files or certain hardware running faster with for. That may not be the actual case; I'm only saying that, a priori, there's nothing in slurp vs readline to convince me while must be faster. I can easily picture a situation in which individual file accesses were slower, due to contention for the disk, a stingy cache, or something else. Looks as though steve is pointing that way.

    Rather than start an argument, though, I'd just like to say that it would be very nice to see some actual benchmarking with various inputs: large files with short lines, large files with long lines, short files, odd shaped files. I don't have the software test experience to write the script but if Someone capable were to offer it, I'd run it and post the results. If we had a few different Monks do this on different platforms, particularly while under various other loads, we might have an objective basis for claims.