Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Question about alternative to File::Find

by swampyankee (Parson)
on Sep 29, 2012 at 16:08 UTC ( #996400=perlquestion: print w/replies, xml ) Need Help??
swampyankee has asked for the wisdom of the Perl Monks concerning the following question:

I'm using File::Find to check a directory tree for duplicated files. To do this, I'm first building a hash, keyed by file size, but since File::Find's "find" function throws out any return value from the sub reference passed to it, I've got to use an "our" variable for the hash. After this has is built, I'm stripping out keys where the entry (a reference to an array) has only one element, then building a second hash keyed by the md5 digest of the first 2**24 bytes (or so; it can be set on the command line) to check for potential duplicates. The program doesn't delete files; it just issues a report.

This violates, first my aesthetic sense, and, second, one of the principles of good programming practice that I've worked with for close to 4 decades, which is that it is bad practice to have globals.

So, is there an alternative to File::Find that will return a list of files, perhaps (but not necessarily) with some types of files, like directories or character special files, filtered out?

Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

  • Comment on Question about alternative to File::Find

Replies are listed 'Best First'.
Re: Question about alternative to File::Find
by davido (Archbishop) on Sep 29, 2012 at 17:04 UTC

    Have you tried File::Find::Rule? It's basically a well designed OO interface that encapsulates (and extends) the File::Find functionality. I don't know exactly what you're doing, but I should think you would be able to do it more cleanly (from aesthetic and "programmer's sensibilities" standpoints) than with File::Find.

    I agree that File::Find's interface is messy. It's a really useful module, but every time I use it I feel like I need to go take a shower afterwards. File::Find::Rule is much less dirty.


      Path::Class::Rule is my favourite module for this sort of thing. It's inspired by File::Find::Rule (but not dependent on it) and it gives you Path::Class::File objects to play with.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      That would be am example of putting blinders on; or, getting somebody else to do the dirty work. (And I personally do not have a problem with either.)

        That's one of the main points to encapsulation: You don't have to get dirty because all the grease, gunk, nuts, bolts, and rust are under the hood where only the mechanic has to look. As long as it works well (as it does), and the grease doesn't leak out (as it doesn't), I also have no problem with it.


        "That would be am example of putting blinders on; ..." -- self.

        Correction: The "am" should have been an "an".

Re: Question about alternative to File::Find
by BrowserUk (Pope) on Sep 29, 2012 at 17:39 UTC

    I don't have your blanket aversion to the use of globals -- they are just another tool in the toolbox, applicable for some purposes and not others -- but if I did, I'd deal with it this way:

    sub buildFileHash { my $root = shift; my %filesBySize; find { ... push @{ $filesBySize{ -s() } } }, $File::Find::name; ... } $root; \%fileBySize; }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

Re: Question about alternative to File::Find
by pvaldes (Chaplain) on Sep 29, 2012 at 16:33 UTC

    Interestingly I was trying to improve this fdupes in perl just now

    You can define your hash out the sub

    my %hash = (); sub wanted (blah blah; %hash = ...; blah blah);
Re: Question about alternative to File::Find
by tinita (Parson) on Sep 29, 2012 at 17:40 UTC
    you can pass a sub reference. this way you don't need a global variable. short example:
    { # somewhere in a block or sub my @files; find(sub { push @files, $_ }, $dir); }

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://996400]
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2018-03-18 06:51 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (228 votes). Check out past polls.