Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Memory Leak when slurping files in a loop

by rizzy (Sexton)
on Dec 07, 2010 at 03:56 UTC ( #875723=perlquestion: print w/ replies, xml ) Need Help??
rizzy has asked for the wisdom of the Perl Monks concerning the following question:

I've been slurping text files into a string using a loop and parsing the text, but I noticed that the memory in windows is always as large as the largest file that has been slurped (i.e., it never drops back down) even if I undefine the string each loop. Is this a problem with windows, or is there a way to resolve this in perl? A (rare) few of the files are 100K+ so this causes problems. I've simplified to code and even in this simple case, the effect is there:

#!C:/Perl/bin -w use File::Listing qw(parse_dir); my $dir='c:/mydir/'; #open the directory and get filenames; opendir(TEMP, $dir) || die("Cannot open directory"); @thefiles= readdir(TEMP); closedir(TEMP); $maxsize=0; #cycle through each of the files; foreach $file (@thefiles) { unless ( ($file eq ".") || ($file eq "..") ) { $filesize = -s $dir.$file; if ($filesize > $maxsize){$maxsize=$filesize} print "$file - $maxsize - $filesize\n"; my $html=''; $slurpfile=$dir.$file; open( my $fh, $slurpfile ) or die "couldn't open\n"; my $html = do { local( $/ ) ; <$fh> } ; undef $html; } }
Basically, I open up the directory and get a list of every file in the directory. Next, each file is individually opened and passed as a string to $html. I immediately undefine the string and repeat the loop. I can't understand why the memory isn't freed up. It should actually be freed up in 3 places each loop, shouldn't it? (1) when I define $html as '' (2) when I slurp the contents of the next file to $html and (3) when I undefine $html.

As it cycles through the thousands of files, I can watch the running maximum filesize and the memory allocated to perl increase in tandem.

I need to slurp the file for various reasons. I wouldn't mind this leak, but I have to do millions of files and it slows things down considerably. Any suggestions?

Comment on Memory Leak when slurping files in a loop
Download Code
Re: Memory Leak when slurping files in a loop
by Anonymous Monk on Dec 07, 2010 at 04:24 UTC
    It would help if it compiled under strictures.

    Maybe something simpler would work.

    #!perl use strict; use warnings; use autodie; # chdir and open die on error use File::DosGlob 'glob'; # DOS-style wildcards use File::Slurp 'slurp'; my $dir = 'c:/mydir/'; my $maxsize = 0; chdir $dir; for my $file (glob('*')) { # check size my $size = -s $file; $maxsize = $size if $size > $maxsize; print "$_: $size\n"; # get contents my @lines = slurp $_; # one string per line #my $lines = slurp $_; # only one string # manipulate contents #... } # report print "\n\nmax file size: $maxsize\n";
Re: Memory Leak when slurping files in a loop
by ww (Bishop) on Dec 07, 2010 at 04:36 UTC
    Perl frees memory for its own reuse... but does NOT return memory to the OS (until execution ends).
Re: Memory Leak when slurping files in a loop
by LanX (Canon) on Dec 07, 2010 at 11:16 UTC
    Why don't you use the sliding window technique already discussed?
    window |------[++++++|++++++]------|------|---| file A B C D E F blocks <-----> match

    If you don't destroy/recreate the variables but just change the content, your memory consumption will╣ be minimal.

    seek and read help reading chunks of data from files.

    substr manipulates the content of strings.

    pos returns the position of your last regex match.

    So only one global variable $window of fixed size holding two current blocks could do and whenever the pos of a match leaves the first block you have to shift a new block into $window.

    Cheers Rolf

    ╣) well, as long as Perl doesn't do very (unlikely) weird speed optimizations.

    UPDATE:

    This code is an almost perfect example of what I meant: Matching in huge files

    The differences are the temporary variable $block which could be optimized away and the handling of pos. Instead of adjusting the window at "halftime", pos is adjusted to the window. Actually I think this is even smarter than what I planed...

      Rolf, I am planning on doing as you suggested. In the meantime (as I'm running some of the code) I thought there might be a very simple fix. Thanks!
      By the way, kudos on your footnotes/graphic i your posts. Very helfpul!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://875723]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (15)
As of 2015-07-06 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (80 votes), past polls