Re: Perl's poor disk IO performance

Are you really wanting to skip over records as per your example above?

Memory mapping can be a better choice, if I/O has been identified as a bottleneck and you want 'semi-random' access to you data. I.e. if you can be a bit selective, skipping records, based on the headers and thus skipping significant blocks of data.

For example, the following uses Sys::Mmap:

#!/usr/bin/perl
use common::sense;

use Sys::Mmap;

my $path = '/tmp/stuff';

my $file_size = -s $path;
die "empty or missing file: $path"
    unless $file_size;

open (my $fh, '+<', $path)
    or die "unable to open $path for read: $!";

mmap(my $data, 0, PROT_READ, MAP_SHARED, $fh)
    or die "mmap: $!";

my $pos = 0;

while ($pos < $file_size) {

    my ($size, $code, $ftype)
      = unpack ("nCC", substr($data, $pos, 4));

    $pos += 4;        # advance past header

    $size = $size - 4;

    if ($size > 0) {
       $pos += $size; # advance past record
   }
}
[download]

If you've identified I/O as a bottleneck, it's worthwhile benchmarking this against your above solution anyway, even if you are reading sequentially. It'll help to determine if read really is imposing a performance penalty!

Comment on Re: Perl's poor disk IO performance Download Code


laziness, impatience, and hubris
	PerlMonks