http://www.perlmonks.org?node_id=609904

andye has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I'm using PDL to work on a dataset that's too large to fit in memory. So, I'm using the function PDL::IO::Fastraw::mapfraw to memory map the file.

Which is all fine and dandy, but I'd like my processing to go quicker. I'm not altering this dataset at all; which is why I'm confused to see it (apparently) being written back to disk when a page is swapped out.

So my question is:
Given that I'm not altering the data and so it doesn't need to be written back to disk, how can I avoid this happening and so speed things up?

The code in Fastraw that's doing the memory mapping is:

sub PDL::mapfraw { my $class = shift; my($name,$opts) = @_; my $hdr; if($opts->{Dims}) { my $datatype = $opts->{Datatype}; if(!defined $datatype) {$datatype = $PDL_D;} $hdr->{Type} = $datatype; $hdr->{Dims} = $opts->{Dims}; $hdr->{NDims} = scalar(@{$opts->{Dims}}); } else { $hdr = _read_frawhdr($name); } $s = PDL::Core::howbig($hdr->{Type}); for(@{$hdr->{Dims}}) { $s *= $_; } my $pdl = $class->zeroes(new PDL::Type($hdr->{Type})); # $pdl->dump(); $pdl->setdims($hdr->{Dims}); # $pdl->dump(); $pdl->set_data_by_mmap($name,$s,1,($opts->{ReadOnly}?0:1), ($opts->{Creat}?1:0), (0644), ($opts->{Creat} || $opts->{Trunc} ? 1:0)); # $pdl->dump(); if($opts->{Creat}) { _writefrawhdr($pdl,$name); } return $pdl; }
(written by Karl Glazebrook, the author of the module, not by me)

This calls on the C routine set_data_by_mmap in PDL/Basic/Core/Core.xs.PL, where what seems to be the relevant part looks like this:

set_data_by_mmap(it,fname,len,writable,shared,creat,mode,trunc) pdl *it char *fname int len int writable int shared int creat int mode int trunc CODE: #ifdef USE_MMAP int fd; pdl_freedata(it); fd = open(fname,(writable && shared ? O_RDWR : O_RDONLY)| (creat ? O_CREAT : 0),mode); if(fd < 0) { croak("Error opening file"); } if(trunc) { ftruncate(fd,0); /* Clear all previous data */ ftruncate(fd,len); /* And make it long enough */ } if(len) { it->data = mmap(0,len,PROT_READ | (writable ? PROT_WRITE : 0), (shared ? MAP_SHARED : MAP_PRIVATE), fd,0); if(!it->data) croak("Error mmapping!"); } else { /* Special case: zero-length file */ it->data = NULL; } PDLDEBUG_f(printf("PDL::MMap: mapped to %d\n",it->data);) it->state |= PDL_DONTTOUCHDATA | PDL_ALLOCATED; pdl_add_deletedata_magic(it, pdl_delete_mmapped_data, len); close(fd); #else

(again not written by me, but by the authors of PDL).

I'm on Mac OS X, 10.4.9. Perl 5.8.6 built for darwin-thread-multi-2level (apparently). All help with this one gratefully received.

Best wishes, andye

PS: I have tried setting ReadOnly, it didn't help.