http://www.perlmonks.org?node_id=85188


in reply to Re: Find duplicate files.
in thread Find duplicate files.

Yes, but there is a fundamental difference...

The first script will only do MD5 hashes on files if there is more than one file with the same file size, then compares the MD5s for the files of that size. Yours MD5's *everything*, then compares *all* the MD5s. If a file has a unique filesize, it *can't* have a duplicate.

Depending on the make up of the files, this can have a dramatic effect:

Files: 15272 Duplicates: 999 Bytes: 15073525
Results:
First script: real 0m11.855s user 0m2.590s sys 0m1.640s Second script: real 0m49.589s user 0m17.110s sys 0m6.500s
The second script is four times slower than the first...

Admittedly, if all your files were the same size there would be no difference, but in most cases, the first script will win. But hey...