Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

mmaping a large file

by grondilu (Pilgrim)
on Aug 23, 2012 at 21:04 UTC ( #989383=perlquestion: print w/ replies, xml ) Need Help??
grondilu has asked for the wisdom of the Perl Monks concerning the following question:

I used to import File::Map to handle a file as if it was a normal scalar variable and it used to work fine. Yet recently I encountered a memory allocation error.

perl -wE 'use File::Map q(map_file); map_file my $f, q(bigdatafile);'
Could not map: Cannot allocate memory at -e line 1.

The only difference I see is that the file is quite large: about 2Go, and I have about 1Go of RAM + 2Go Swap.

Yet I don't understand why this should be an issue. map_file is not supposed to load the whole file in memory, is it?

I also noticed that there is a more standard perl module called Sys::Mmap that does the same thing. But it also gave me a memory allocation error.

What is the proper way to tie a large file to a scalar variable?

Comment on mmaping a large file
Download Code
Re: mmaping a large file
by Illuminatus (Curate) on Aug 23, 2012 at 22:24 UTC
    operating system? 32 or 64 bit? output of 'ulimit -m -v' (if *nix)?

    fnord

Re: mmaping a large file
by BrowserUk (Pope) on Aug 23, 2012 at 22:27 UTC
    ap_file is not supposed to load the whole file in memory, is it?

    Yes it does, if you access the entire file. It just does so lazily, on demand rather than all at once when you first 'read' it.

    That is to say, when you first map a file, none of its contents are actually loaded from disk. A chunk of your process' virtual address space -- the size of the file -- is reserved and the mapping call returns very quickly. Now, when you attempt to access bits of the file, the 4096-byte page(s) containing the bit you access, will be loaded from disk on-demand (via page fault(s)).

    If you have a large dataset in a file: and a) only need access to small bits of it in any given run; b) you can find those bits without reading through the whole file from the beginning; then mapping can be an effective way of minimising the number of pages read from disk.

    But, if all you are going to do with the mapped file, is to read it serially from beginning to end, you're better off using normal file IO which doesn't cause page faults, and can read the entire file (serially) through a small amount of memory. (Eg. line by line through one or two page sized buffers.).

    Memory mapping also requires that you have sufficient virtual address space in your process in order to hold the amount of the file you need concurrent access to. For 32-bit processes, that means files > 2GB require the programmer to re-map them in order to access the whole file.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      If you have a large dataset in a file: and a) only need access to small bits of it in any given run; b) you can find those bits without reading through the whole file from the beginning; then mapping can be an effective way on minimising the number of pages read from disk.

      yes, that's exactly the use case.

      I've just tried:

      use Sys::Mmap; new Sys::Mmap my $f, 8192, q(bigfile);

      but now I get an error during cleaning:

      (in cleanup) munmap failed! errno 22 Invalid argument

      This is not going to be easy, is it?

        Warning: I don't use *nix, so I cannot test anything I'm about to say.

        The first thing I notice is that the POD for Sys::Mmap uses

        new Mmap my $f, 8192, q(bigfile);

        not

        new Sys::Mmap my $f, 8192, q(bigfile);

        I would have expected you to receive a compile-time error message from that, which suggests you aren't using strict/warnings. It might be a good idea to start.

        The other thought is that mapping a 2GB file through a 8k window is going to involve a lot of shuffling.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://989383]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-09-23 23:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls