Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^6: Perl solution for storage of large number of small files

by andye (Curate)
on Apr 30, 2007 at 14:51 UTC ( #612787=note: print w/replies, xml ) Need Help??


in reply to Re^5: Perl solution for storage of large number of small files
in thread Perl solution for storage of large number of small files

Hi jbert,

So can you read() a file that's bigger than memory? I thought you couldn't... hence mmap.

Best wishes, andye

  • Comment on Re^6: Perl solution for storage of large number of small files

Replies are listed 'Best First'.
Re^7: Perl solution for storage of large number of small files
by jbert (Priest) on Apr 30, 2007 at 15:28 UTC
    You *can* do a single read that's bigger than your available RAM, but that's not what I meant.

    If you want to access data in a file thats larger than your available RAM, you'll basically only be working on part of a file at a time, however you go about it. You'll need something to move parts of the file into and out of memory as you go.

    One option is to use mmap. Your memory access patterns will then determine which pages the OS faults into your process and which are discarded by the LRU.

    You can also use read(). You'll get very similar benefits of caching from the OS, but you'll have to do the "getting data into memory" bit yourself more explicitly.

    mmap has it's place and is useful, but I've often come across people who do things like "we'll keep an in-memory cache of recently-used files to avoid having to read them from disk each time", or "we'll use a RAM disk for these files", not realising that if their guess of recently-used is accurate then they don't need to do that, since the OS will make sure the data in those files stays in memory (and if it's inaccurate then they're wasting memory which could be put to better using caching the genuinely frequently used stuff).

    In one particular case I saw, the file cache was per-process, so replicated across 60 or so procs on the box, wasting a significant amount of memory (which was a precious resource on the box in question).

    So sorry for picking up on this but I just think that many people don't seem to understand that read() can be entirely satisfied from RAM, and will be for a commonly-accessed file (and assuming noatime on the mount point on the box).

    Your use of mmap seems perfectly sensible to me, but for reasons of coding simplicity, not because "So memory mapping meant that the often-access data stayed cached in ram". That benefit also applies to read().

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://612787]
help
Chatterbox?
and the monks are mute...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2017-12-16 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (459 votes). Check out past polls.

    Notices?