http://www.perlmonks.org?node_id=996400

swampyankee has asked for the wisdom of the Perl Monks concerning the following question:

I'm using File::Find to check a directory tree for duplicated files. To do this, I'm first building a hash, keyed by file size, but since File::Find's "find" function throws out any return value from the sub reference passed to it, I've got to use an "our" variable for the hash. After this has is built, I'm stripping out keys where the entry (a reference to an array) has only one element, then building a second hash keyed by the md5 digest of the first 2**24 bytes (or so; it can be set on the command line) to check for potential duplicates. The program doesn't delete files; it just issues a report.

This violates, first my aesthetic sense, and, second, one of the principles of good programming practice that I've worked with for close to 4 decades, which is that it is bad practice to have globals.

So, is there an alternative to File::Find that will return a list of files, perhaps (but not necessarily) with some types of files, like directories or character special files, filtered out?


Information about American English usage here and here. Floating point issues? Please read this before posting. — emc