Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Memory mapping: preventing writes (PDL::IO::Fastraw)

by andye (Curate)
on Apr 13, 2007 at 13:15 UTC ( #609904=perlquestion: print w/replies, xml ) Need Help??
andye has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow monks,

I'm using PDL to work on a dataset that's too large to fit in memory. So, I'm using the function PDL::IO::Fastraw::mapfraw to memory map the file.

Which is all fine and dandy, but I'd like my processing to go quicker. I'm not altering this dataset at all; which is why I'm confused to see it (apparently) being written back to disk when a page is swapped out.

So my question is:
Given that I'm not altering the data and so it doesn't need to be written back to disk, how can I avoid this happening and so speed things up?

The code in Fastraw that's doing the memory mapping is:

sub PDL::mapfraw { my $class = shift; my($name,$opts) = @_; my $hdr; if($opts->{Dims}) { my $datatype = $opts->{Datatype}; if(!defined $datatype) {$datatype = $PDL_D;} $hdr->{Type} = $datatype; $hdr->{Dims} = $opts->{Dims}; $hdr->{NDims} = scalar(@{$opts->{Dims}}); } else { $hdr = _read_frawhdr($name); } $s = PDL::Core::howbig($hdr->{Type}); for(@{$hdr->{Dims}}) { $s *= $_; } my $pdl = $class->zeroes(new PDL::Type($hdr->{Type})); # $pdl->dump(); $pdl->setdims($hdr->{Dims}); # $pdl->dump(); $pdl->set_data_by_mmap($name,$s,1,($opts->{ReadOnly}?0:1), ($opts->{Creat}?1:0), (0644), ($opts->{Creat} || $opts->{Trunc} ? 1:0)); # $pdl->dump(); if($opts->{Creat}) { _writefrawhdr($pdl,$name); } return $pdl; }
(written by Karl Glazebrook, the author of the module, not by me)

This calls on the C routine set_data_by_mmap in PDL/Basic/Core/Core.xs.PL, where what seems to be the relevant part looks like this:

set_data_by_mmap(it,fname,len,writable,shared,creat,mode,trunc) pdl *it char *fname int len int writable int shared int creat int mode int trunc CODE: #ifdef USE_MMAP int fd; pdl_freedata(it); fd = open(fname,(writable && shared ? O_RDWR : O_RDONLY)| (creat ? O_CREAT : 0),mode); if(fd < 0) { croak("Error opening file"); } if(trunc) { ftruncate(fd,0); /* Clear all previous data */ ftruncate(fd,len); /* And make it long enough */ } if(len) { it->data = mmap(0,len,PROT_READ | (writable ? PROT_WRITE : 0), (shared ? MAP_SHARED : MAP_PRIVATE), fd,0); if(!it->data) croak("Error mmapping!"); } else { /* Special case: zero-length file */ it->data = NULL; } PDLDEBUG_f(printf("PDL::MMap: mapped to %d\n",it->data);) it->state |= PDL_DONTTOUCHDATA | PDL_ALLOCATED; pdl_add_deletedata_magic(it, pdl_delete_mmapped_data, len); close(fd); #else

(again not written by me, but by the authors of PDL).

I'm on Mac OS X, 10.4.9. Perl 5.8.6 built for darwin-thread-multi-2level (apparently). All help with this one gratefully received.

Best wishes, andye

PS: I have tried setting ReadOnly, it didn't help.

Replies are listed 'Best First'.
Re: Memory mapping: preventing writes (PDL::IO::Fastraw)
by syphilis (Chancellor) on Apr 14, 2007 at 10:03 UTC
      Thanks Rob - yes, it looks like I'll need to give them a try. My suspicion is that it depends on the C implementation of mmap on my platform, so there's probably not much I can do... but maybe someone knows how to kick it into doing the right thing.

      Best wishes, andye

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://609904]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2018-06-21 05:39 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.