http://www.perlmonks.org?node_id=189664


in reply to How to process each byte in a binary file?

Okay, looks like my IO::Scalar suggestion is not going to work. I guess that leaves unpack(), substr(),split(), and the regex ?

Looks like unpack() is the clear winner on my machine:

#!/usr/bin/perl use Benchmark; my $string="X" x 102400; timethese(100, { 'split' => sub { for (split(//,$string)) {}; }, 'unpack' => sub { for (unpack("C*",$string)) {}; }, 'regex' => sub { while($string =~ /./sg) {} }, 'substr' => sub { for(my $i=0;$i<length($string);$i++){ substr($string,$i,1); } }, });
Gives me:

$ perl foo
Benchmark: timing 100 iterations of regex, split, substr, unpack...
     regex: 44 wallclock secs (43.13 usr +  0.00 sys = 43.13 CPU)
     split: 49 wallclock secs (47.90 usr +  0.04 sys = 47.94 CPU)
    substr: 58 wallclock secs (55.70 usr +  0.00 sys = 55.70 CPU)
    unpack: 27 wallclock secs (26.48 usr +  0.00 sys = 26.48 CPU)

Update:Reposted results after correcting typo.

Replies are listed 'Best First'.
Re: Re: How to process each byte in a binary file?
by John M. Dlugosz (Monsignor) on Aug 13, 2002 at 00:46 UTC
    I get similar results: split is between regex and substr. Makes me wonder, though, since split// is a "special case" that splits on every character, why it isn't simply as fast as unpack?

    —John

      I added a test case for just using read(FILE,1) from a real file, and it's about the same speed as the unpack() on a string (for largish strings).

      Of course, this leaves the file open the whole time..but it's wonderfully simple :) I also have a very expensive Netapp filer helping the speed with read-ahead and a huge cache..YMMV.