Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Perl and memory usage. Can it be released?

by sherab (Scribe)
on Feb 07, 2014 at 16:44 UTC ( #1073900=perlquestion: print w/replies, xml ) Need Help??
sherab has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!
I have a perl script that spend its days reading in files and processing them. We get files that range from tiny to, in some cases, 90megs. I know that memory allocation is somewhat elastic but my question is about what happens when my script has been reading 90k files all morning and then at around 11am it reads in a 90meg monster and then goes back the rest of the day reading in 90k files. Is all the memory that it took up for the 90meg instance still being consumed for the rest of day? I assume that it does keep it since Perl only releases that memory after the process has exited and the process is running all day.

I also see that my perl is compiled with "usemymalloc=n". I would welcome any insights someone has on this or any input if you've experienced this before. It would be great somehow if that memory could be re-released back into the wild.
UPDATED: THANK YOU so much monks! A lot of great great answers!
  • Comment on Perl and memory usage. Can it be released?

Replies are listed 'Best First'.
Re: Perl and memory usage. Can it be released?
by BrowserUk (Pope) on Feb 07, 2014 at 16:54 UTC

    Two thoughts:

    1. If your program processes the files line-by-line, then the maximum memory it would need at any given time is the length of the longest line.

      Which for most files is a trivial amount.

    2. If you really need to load the files in their entirety each time, then slurping them into a single huge scalar rather than an array of lines, would ensure that when the file is processed and the scalar is freed, the whole amount of the scalar would be returned to the OS, not just the process pool.

      Note: I know this to be true of Perl running under Windows for single allocations over 1MB.

      The picture of whether other OS mallocs have similar arrangements for large, single allocations isn't so clear.

      Of course, this will only help if you can avoid breaking the single scalar into an array or hash.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl and memory usage. Can it be released?
by kennethk (Abbot) on Feb 07, 2014 at 16:58 UTC

    As I understand, the perl process holds onto any heap memory it gets allocated (I could be wrong), so yes, in your case it's going to always have the memory footprint of the large use case. There are couple approaches that might help ameliorate this for you:

    1. Can you modify your file parsing so it's streaming instead of slurping? Just because you need to process 90 MB doesn't necessarily mean you need to hold onto 90 MB of data.

    2. Can you combine the above with a database? For example, by using an SQLite database, you should be able to avoid a large memory footprint for perl while still maintaining access to the data. You could swap that to an in-memory database if file access times become prohibitive, but I'm unclear as to whether that would create a permanent memory footprint.

    3. Finally, you could have a parent process that forks, and the children parse your files. That way, when the child is reaped, the memory is recovered.

    See also

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Perl and memory usage. Can it be released?
by davido (Archbishop) on Feb 07, 2014 at 17:58 UTC

    If you require that the entire file be held in memory at once, create a supervisor script that assigns a file to a separate worker process and waits for results. When the worker process finishes and sends its results back to the supervisor, the worker terminates, freeing its resources. The supervisor's memory consumption will remain stable at all times. The worker processes may use a little, or a lot... but when they're done, they vanish and release their memory.


Re: Perl and memory usage. Can it be released?
by LanX (Bishop) on Feb 07, 2014 at 17:01 UTC
    IIRC the "default" answer is that memory is only returned to the OS when the Perl process ends, dunno if there is any reliable documentation for a defined behavior.

    BUT as others have already pointed out, why do you need to load all 90 MB at once?

    Consider using a sliding window technique if you really need to investigate consecutive chunks of data.


    Worstcase consider running a separate process.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Perl and memory usage. Can it be released?
by ikegami (Pope) on Feb 07, 2014 at 20:37 UTC
    Does it have to be the same process? You could replace
    with something like
    if (my $pid = fork()) { waitpid($pid, 0); } else { do_work($qfn); }
Re: Perl and memory usage. Can it be released?
by oiskuu (Hermit) on Feb 07, 2014 at 21:51 UTC

    On Linux, the glibc malloc behavior is influenced by environment variables. Large allocations are performed via mmap, smaller chunks usually live on data segment arena, which grows or shrinks via brk. Default mmap threshold might be 128k. For example:

    $ strace -e brk,mmap perl -e 'pack q(x200000)'
    brk(0x79f000)                           = 0x79f000
    $ export MALLOC_MMAP_THRESHOLD_=300000
    $ strace -e brk,mmap perl -e 'pack q(x200000)'
    brk(0x79f000)                           = 0x79f000
    brk(0x7db000)                           = 0x7db000
    brk(0x7aa000)                           = 0x7aa000
    First time, the memory was obtained via mmap; second time, by growing the arena. Arena may also shrink (here it was possible), but even if it doesn't, the unused pages are typically not much of a concern. (mmap-ed storage is unmapped when freed.)

    If the process is long-lived, does great many allocations at various stages, then memory fragmentation may become a problem. (Web browsers come to mind.) When processing file after a file as you describe, this is unlikely to matter either. Memory gets allocated and released in full every time. Just be sure there are no leaks.

      POSIX brk memory will never shrink due to fragmentation. mmap memory/Win32 malloc can shrink because its all managed in a linked list chain, and the mem pages are randomly scattered through out the process.

        GNU libc allocator is derived from Doug Lea malloc, a proven general-purpose allocator. Go on, unpack and read the source and the comments (I'm looking at glibc-2.17/malloc/malloc.c)

        True, trims do not happen much because small data gets allocated from fastbins. But try to malloc a lot of somewhat larger blocks (couple hundred bytes each), and free them all. You shall see a shrink.

        Update: from said malloc.c:

        And please don't say "never". E.g. freeing a block 64k to 128k in size triggers fastbin consolidation. If your program has performed a work cycle, freeing all temps, then it is quite possible a trim takes place. It depends on usage.

Re: Perl and memory usage. Can it be released?
by sundialsvc4 (Abbot) on Feb 07, 2014 at 17:49 UTC

    Well, this absolutely qualifies as a hack, but I know that it is a hack that is sometimes used ... and useful.   After the process has run through some number of requests, let it choose to commit suicide.   Then, ensure that some init-like process will recognize its death and immediately re-spawn it.   Exactly as is done sometimes with FastCGI, or even with mod_perl, especially when the app in question is oldy-moldy.   You make no attempt to re-engineer how the app goes about its business, having established that it still seems to work.   You simply modify it to, every now and again, put itself to death.   (Which is n-o-t the same as killing it!)

    Of course, it is also possible to run it by means of a do-nothing “babysitter” process that launches the other process as a child, waits for it to die, and then takes care of re-launching it ... forever.

    Hack.   Wart.   Inelegant.   Smells bad.   Quick.   Works.   Done.

Re: Perl and memory usage. Can it be released?
by bulk88 (Priest) on Feb 19, 2014 at 03:57 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1073900]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2018-06-23 03:31 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.