http://www.perlmonks.org?node_id=420672


in reply to Removing Duplicate Files

Dupseek is a pretty good Perl implementation of what you're after, which has been around for a while now. I've never tested in on a data set of this size, though.

Tim