http://www.perlmonks.org?node_id=61540


in reply to Assining data to an array is slow..

Hi,
some days ago I faced the time taken by unpack compared to other ways of extracting fields from records.
I think that using substr can speed up the things.
After your posting I took the occasion to learn how to use Benchmark and tested this code below on a 400000-record file:
#/usrl/bin/perl -w use Benchmark; $filename = @ARGV[0]; timethese ( $count, {'Method One' => '&One', 'Method Two' => '&Two', 'Method Three' => '&Three'} ); sub One { open(FILE,@ARGV[0]); while($row=<FILE>) { @data = unpack('a4a2a2a2a2',$row); } close(FILE); } sub Two { open(FILE,@ARGV[0]); while($row=<FILE>) { ($data[0],$data[1],$data[2],$data[3],$data[4]) = unpack('a4a2a2a2a2',$row); } close(FILE); } sub Three { open(FILE,@ARGV[0]); while($row=<FILE>) { $data[0] = substr($row,0,4); $data[1] = substr($row,4,2); $data[2] = substr($row,6,2); $data[3] = substr($row,8,2); $data[4] = substr($row,10,2); } close(FILE); }
and I got this.
Method One: 43 wallclock secs (40.72 usr + 1.23 sys = 41.95 CPU) @ 0 +.02/s (n=1) (warning: too few iterations for a reliable count) Method Two: 43 wallclock secs (41.50 usr + 1.42 sys = 42.92 CPU) @ 0 +.02/s (n=1) (warning: too few iterations for a reliable count) Method Three: 36 wallclock secs (33.76 usr + 1.42 sys = 35.18 CPU) @ + 0.03/s (n=1) (warning: too few iterations for a reliable count)
Again I think sub Three approach (substr) proves to be faster than sub One (unpack into an array) or Two (unpack into array elements).
Hope this may help.

ciao,
Roberto

Replies are listed 'Best First'.
Re: Re: Assigning data to an array is slow..
by davorg (Chancellor) on Mar 01, 2001 at 16:50 UTC

    Doesn't getting the warning "too few iterations for a reliable count" in the output bother you at all?

    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

      hi davorg,
      sure that warning it's not very nice... :) however I also saw similar difference (about 15%) by running separately the routines and checking with the ps - process status - command.
      As I said, I've tried to use Benchmark for the first time (there's always a first time...) for this test. However, I guess it's not a problem of file size that warning. I'd appreciate, for my learning, if the code would be changed by someone to something that can be bechmarked (if it is not a problem of input file size).
      ciao
      Roberto

        The problem isn't in the size of the file that you're processing, but rather in the number of times you run the test. You don't show the code that sets your $count variable, but you should look at increasing that value.

        --
        <http://www.dave.org.uk>

        "Perl makes the fun jobs fun
        and the boring jobs bearable" - me