Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Find duplicate files with exact same files noted

by smahesh (Pilgrim)
on Sep 01, 2010 at 04:04 UTC ( #858273=note: print w/ replies, xml ) Need Help??


in reply to Re: Find duplicate files with exact same files noted
in thread Find duplicate files with exact same files noted

Hi, I wrote a similar script using the MD5 hash for detecting duplicates. It is not cleaned up or optimized - but does the intended job. It is now in my tool collection.

#!/usr/bin/perl # # Find duplicate files in specified directories using md5sum values to # identify duplicates. # # (C) 2009 S.M.Mahesh use strict; use warnings; use File::Find; use Digest::MD5; my $version = 0.1; my %md5sums; my $md5 = Digest::MD5->new(); sub Usage() { print<<USAGEDOC; $0 v$version - FindDuplicate script USAGE: $0 <DIR1> [DIR2...DIRn] where, DIR1..DIRn Specifies the directories to search EXAMPLE: $0 /home/user/downloads /home/user/documents USAGEDOC exit 1; } sub wanted { return unless -f $File::Find::name; # Return if it is not a plain +file return if -l $File::Find::name; # Return in case this is a symlink if (open(FILE, $File::Find::name) ) { binmode(FILE); my $sum = $md5->addfile(*FILE)->hexdigest(); close(FILE); my $aref = $md5sums{$sum}; if ( defined $aref ) { push @$aref, $File::Find::name; } else { my @list = ($File::Find::name); $md5sums{$sum} = \@list; } } else { print "ERROR: Could not open '$File::Find::name' for reading\n +"; } return; } Usage() if( $#ARGV < 0 ); foreach my $dir (@ARGV) { print "$dir \n"; unless ( -d $dir ) { print "ERROR: '$dir' is not a valid directory\n"; next; } find(\&wanted, $dir); } print "\n", '-'x25, "\n"; print "Printing duplicate files (if any)\n"; print '-'x25, "\n\n"; foreach my $sum (sort keys %md5sums) { my $list = $md5sums{$sum}; if ($#$list > 0) { print "$sum :\n"; foreach my $file (@$list) { print "\t $file\n"; } print"\n" } } print '-'x25, "\n";

Mahesh


Comment on Re^2: Find duplicate files with exact same files noted
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://858273]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (14)
As of 2014-12-19 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (78 votes), past polls