Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Removing Duplicate Files

by tfrayner (Curate)
on Jan 09, 2005 at 12:19 UTC ( #420672=note: print w/replies, xml ) Need Help??

in reply to Removing Duplicate Files

Dupseek is a pretty good Perl implementation of what you're after, which has been around for a while now. I've never tested in on a data set of this size, though.