Question about alternative to File::Find

swampyankee
I'm using File::Find to check a directory tree for duplicated files. To do this, I'm first building a hash, keyed by file size, but since File::Find's "find" function throws out any return value from the sub reference passed to it, I've got to use an "our" variable for the hash. After this has is built, I'm stripping out keys where the entry (a reference to an array) has only one element, then building a second hash keyed by the md5 digest of the first 2**24 bytes (or so; it can be set on the command line) to check for potential duplicates. The program doesn't delete files; it just issues a report.

This violates, first my aesthetic sense, and, second, one of the principles of good programming practice that I've worked with for close to 4 decades, which is that it is bad practice to have globals.

So, is there an alternative to File::Find that will return a list of files, perhaps (but not necessarily) with some types of files, like directories or character special files, filtered out?

Re: Question about alternative to File::Find
davido

    Have you tried File::Find::Rule? It's basically a well designed OO interface that encapsulates (and extends) the File::Find functionality. I don't know exactly what you're doing, but I should think you would be able to do it more cleanly (from aesthetic and "programmer's sensibilities" standpoints) than with File::Find.

    I agree that File::Find's interface is messy. It's a really useful module, but every time I use it I feel like I need to go take a shower afterwards. File::Find::Rule is much less dirty.


      Path::Class::Rule is my favourite module for this sort of thing. It's inspired by File::Find::Rule (but not dependent on it) and it gives you Path::Class::File objects to play with.

      That would be am example of putting blinders on; or, getting somebody else to do the dirty work. (And I personally do not have a problem with either.)

        That's one of the main points to encapsulation: You don't have to get dirty because all the grease, gunk, nuts, bolts, and rust are under the hood where only the mechanic has to look. As long as it works well (as it does), and the grease doesn't leak out (as it doesn't), I also have no problem with it.


Re: Question about alternative to File::Find
BrowserUk

    I don't have your blanket aversion to the use of globals -- they are just another tool in the toolbox, applicable for some purposes and not others -- but if I did, I'd deal with it this way:

    sub buildFileHash { my $root = shift; my %filesBySize; find { ... push @{ $filesBySize{ -s() } } }, $File::Find::name; ... } $root; \%fileBySize; }

Re: Question about alternative to File::Find
pvaldes

    Interestingly I was trying to improve this fdupes in perl just now

    You can define your hash out the sub

    my %hash = (); sub wanted (blah blah; %hash = ...; blah blah);
Re: Question about alternative to File::Find
tinita
    you can pass a sub reference. this way you don't need a global variable. short example:
    { # somewhere in a block or sub my @files; find(sub { push @files, $_ }, $dir); }

