Re: File::Find memory leak
by samtregar (Abbot) on Jan 27, 2004 at 04:08 UTC
|
There's no such thing as a "garbage collection" module. Perl does its own garbage collection using reference counting and if something's getting lost there's not much you can do about it (aside from fixing the leaky code).
If you can't find and fix the leak you'll probably have to fork() a sub-process to do whatever leaks, pass the results up to the parent via a pipe or temp file and then exit() the child. When the child exits any memory it used will be reclaimed by the operating system. I've used this technique before with leaky Perl modules. Give it a try and post again if you have trouble.
-sam
PS: The above suggestion assumes you're working on a Unix system. I imagine things are different in Windows-land, where fork() is emulated with threads and exit() probably doesn't free memory.
| [reply] |
|
Thanks Sam, that was exactly my thinking. Great minds! If the fork doesn't work, a simpler and possible alternative is to write a main script that does all the logging and a second script is called each time it traverses a users directory which contains the "File::Find" module. This will keep it constantly freeing memory it uses. I'll let you know the results.
The "perltodo" manual page says some garbage collection work is still to be done in future for perl.
Thanks
Dean
| [reply] |
Re: File::Find memory leak
by BrowserUk (Patriarch) on Jan 27, 2004 at 06:11 UTC
|
Using 5.8.2 (AS808) on XP, and processing a little over 200_000 files, I see a growth pattern of around 22k per iteration, or maybe 10 bytes per file.
If I fork each iteration of the search, the growth appears to be increased slightly to 31k/iter of 205428 files.
Doing a crude comparision of heap dumps taken before & after an iteration, it appears as if the leakage isn't due to something not being freed, but rather to fragmentation of the heap, as larger entities are freed and their space half re-used for smaller things, thereby requiring the heap to grow the next time the larger entity needs to be allocated.
Note: The comparision was very crude...with something like 12000 individual blocks on the heap, it had to be:)
Having the script exec itself after each iteration does stop the growth, but whether that is practical will depend upon the nature and design of your program.
| [reply] |
Re: File::Find memory leak
by graff (Chancellor) on Jan 27, 2004 at 14:17 UTC
|
I don't mean to spoil the fun of using perl, but in a case like this, I would consider looking at a Windows port of the GNU find utility. It will undoubtedly be faster and have a smaller memory footprint. (Frankly, the File::Find module seems to be a fountain of difficulty... I tend to avoid it.) | [reply] |
|
Thanks graff, this is a clever solution. I'm all up for looking at alternatives. Am looking into this now. Likewise I'm also tending to avoid the File::Find module. I'm having to rewrite my program now as the use of the File::Find module was at the heart of it, and its rendered my program obsolete as a practical solution due to the sheer size of our file server.
Thanks mate
Dean
| [reply] |
|
Just a thought about something you might try... This works for me under unix, and I expect it would work in windows as well. It's very good in terms of using minimal memory, and having fairly low system overhead overall:
chdir $toppath or die "can't cd to $toppath: $!";
open( FIND, "find . -type d |" ) or die "can't run find: $!";
while ( my $d = <FIND> ) {
chomp $d;
unless ( opendir( D, $d )) {
warn "$toppath/$d: open failed: $!\n";
next;
}
while ( my $f = readdir( D )) {
next if ( -d "$d/$f" ); # outer while loop will handle all dir
+s
# do what needs to be done with data files
}
# anything else we need to do while in this directory
}
close FIND;
This has the nice property that all the tricky recursion stuff is handled by "find", while all the logic-instensive, file-based stuff is handled pretty easily by perl, working with just the data files in a single directory at any one time. | [reply] [d/l] |
Re: File::Find memory leak
by Anonymous Monk on Jan 27, 2004 at 04:29 UTC
|
Do new files keep being created in that directory?
Are they symlinks?
What version of File::Find do you have?
What perl version? | [reply] |
|
Its the lastest version of perl, just downloaded last week. New Files are created all the time, its our main file server, very large. Many symlinks but I don't follow them. Running this on Win2000.
Dean
| [reply] |
Re: File::Find memory leak
by tachyon (Chancellor) on Mar 10, 2004 at 03:55 UTC
|
Saw your recent post with this link. You should find that a variation on this will work and it does not leak. It 'recurses' width first using a very useful perl hack - you can push to an array you are iterating over (dont shift or splice but). All it does is push the dirs into its dir list as it finds them. Short, simple and fast.
This builds an array of all the files it finds (full path) but you could stick your &wanted code in there instad and have it return void. With UNC paths you will want to swap the / to \\
sub recurse_tree {
my ($root) = @_;
$root =~ s!/$!!;
my @dirs = ( $root );
for my $dir ( @dirs ) {
opendir DIR, $dir or do { warn "Can't read $dir\n"; next };
for my $file ( readdir DIR ) {
# skip . dirs
next if $file eq '.' or $file eq '..';
# skip symlinks
next if -l "$dir/$file";
if ( -d "$dir/$file" ) {
push @dirs, "$dir/$file";
}
else {
push @files, "$dir/$file";
}
}
closedir DIR;
}
return \@dirs, \@files;
}
| [reply] [d/l] |
|
Thanks, have tested this and it works nicely. The only problem I can see is that with a large directory, like our terrabyte file server, the return arrays would get too big. It would have to be broken into bite size pieces and returned piecemeal OR you'd have to process the files and directory as you find them instead of pushing them (which would be my most obvious choice).
Thanks
| [reply] |
|
Actually you *may* need real recursion to do that. You don't have to return the list of files and can certainly process them on the fly. This will of course reduce the in memory array size by orders of magnitude depending on file:dir ratio.
However using this approach, which as you not works fine, you are basically stuck with an array listing *all* the dirs. There is a reason for this. Although it is safe to push while you iterate over an array it is not safe to shift AFAIK but I have not extensively tested that. The perl docs *do basically say* don't do *anything* while iterating over an array but it copes fine with push. This makes a certain degree of sense as all we are doing is adding to the end of a link list of pointers and incrementing the last index by 1 with each push. In the loop perl is obviously not caching the end of list pointer but must be rechecking each time.
If you shift then there is an issue. If you are looping from offset N and are at index I and you move N then.....
Anyway a gig of RAM will cope with ~5-10M+ dirs so it should not be a major issue unless you have very few files per dir.
As the search is width first you could easily batch it up into a series of sub searches based on 1-2 levels deep if you have serious terrabytes.
| [reply] |
|