Re: File::Find memory leak

Saw your recent post with this link. You should find that a variation on this will work and it does not leak. It 'recurses' width first using a very useful perl hack - you can push to an array you are iterating over (dont shift or splice but). All it does is push the dirs into its dir list as it finds them. Short, simple and fast.

This builds an array of all the files it finds (full path) but you could stick your &wanted code in there instad and have it return void. With UNC paths you will want to swap the / to \\

sub recurse_tree {
    my ($root) = @_;
    $root =~ s!/$!!;
    my @dirs = ( $root );
    for my $dir ( @dirs ) {
        opendir DIR,  $dir or do { warn "Can't read $dir\n"; next };
        for my $file ( readdir DIR ) {
            # skip . dirs
            next if $file eq '.' or $file eq '..';
            # skip symlinks
            next if -l "$dir/$file";
            if ( -d "$dir/$file" ) {
                push @dirs, "$dir/$file";
            }
            else {
                push @files, "$dir/$file";
            }
        }
        closedir DIR;
    }
  return \@dirs, \@files;
}
[download]

cheers

tachyon

Comment on Re: File::Find memory leak Download Code

Replies are listed 'Best First'.
Re: Re: File::Find memory leak by crabbdean (Pilgrim) on Mar 14, 2004 at 21:13 UTC
Thanks, have tested this and it works nicely. The only problem I can see is that with a large directory, like our terrabyte file server, the return arrays would get too big. It would have to be broken into bite size pieces and returned piecemeal OR you'd have to process the files and directory as you find them instead of pushing them (which would be my most obvious choice). Thanks Dean The Funkster of Mirth Programming these days takes more than a lone avenger with a compiler. - sam RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers	[reply]
Re: Re: Re: File::Find memory leak by tachyon (Chancellor) on Mar 14, 2004 at 22:00 UTC
Actually you may need real recursion to do that. You don't have to return the list of files and can certainly process them on the fly. This will of course reduce the in memory array size by orders of magnitude depending on file:dir ratio. However using this approach, which as you not works fine, you are basically stuck with an array listing all the dirs. There is a reason for this. Although it is safe to push while you iterate over an array it is not safe to shift AFAIK but I have not extensively tested that. The perl docs do basically say don't do anything while iterating over an array but it copes fine with push. This makes a certain degree of sense as all we are doing is adding to the end of a link list of pointers and incrementing the last index by 1 with each push. In the loop perl is obviously not caching the end of list pointer but must be rechecking each time. If you shift then there is an issue. If you are looping from offset N and are at index I and you move N then..... Anyway a gig of RAM will cope with ~5-10M+ dirs so it should not be a major issue unless you have very few files per dir. As the search is width first you could easily batch it up into a series of sub searches based on 1-2 levels deep if you have serious terrabytes. cheers tachyon	[reply]
Re: Re: Re: Re: File::Find memory leak by crabbdean (Pilgrim) on Mar 14, 2004 at 22:35 UTC
Yes, those were my thoughts also. Cheers Dean The Funkster of Mirth Programming these days takes more than a lone avenger with a compiler. - sam RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers	[reply]


P is for Practical
	PerlMonks