Beefy Boxes and Bandwidth Generously Provided by pair Networks httptech
We don't bite newbies here... much
 
PerlMonks  

Remove (nearly) duplicate files recursively

by wrinkles (Monk)
on May 19, 2008 at 09:20 UTC ( #687318=perlquestion: print w/ replies, xml ) Need Help??
wrinkles has asked for the wisdom of the Perl Monks concerning the following question:

Dearest Monks,

I have near-duplicate files in deeply nested directories. The duplicates are appended with " 1" as in "Mack the Knife 1.mp3". The duplicates are found together in the same directories. I would like to be able to locate potential duplicates (with regex), then check to see if a counterpart (without the " 1" is located in the directory. If the counterpart exists, rename. If not, then it's not a "duplicate", so do nothing. The best I can do is find the " 1.mp3" files and move them. How can I check for the counterpart? TIA.

#!/usr/bin/perl -w use strict; use File::Find; find(\&process, '/Users/Shared/Music'); sub process { if (/[\d\D]*(?<!No\.)\s+1.mp3$/) { rename "$File::Find::name", "/Users/wrinkles/dupes/$_" or die $!; print "$File::Find::name removed. \n"; } }

The regex checks for a trailing " 1" that is not preceded with "No.". If I can check against the counterpart file, the look-behind check for "No." won't be necessary.

Comment on Remove (nearly) duplicate files recursively
Download Code
Re: Remove (nearly) duplicate files recursively
by moritz (Cardinal) on May 19, 2008 at 09:27 UTC
    my $orig = $File::Find::name; $orig =~ s/ 1\.mp3/.mp3/; if (-e $orig){ print "Counterpart exists\n"; }

    Update: fixed substitution, linuxer++

      small correction in the substitution

      $orig =~ s/ 1\.mp3/.mp3/;
      Thanks moritz, here's the updated code:
      #!/usr/bin/perl -w use strict; use File::Find; find(\&process, '/Users/Shared/Music'); sub process { my $song = $File::Find::name; my $orig = $song; $orig =~ s/ 1\.mp3/.mp3/; if ( (m/[\d\D]*\s+1.mp3$/) && (-e $orig) ){ rename "$song", "/Users/wrinkles/dupes/$_" or die $!; print "$song removed. \n"; } }
Re: Remove (nearly) duplicate files recursively
by pc88mxer (Vicar) on May 19, 2008 at 15:48 UTC
    Just one little efficiency pointer -- you can perform the substitution and check if the file ends in .mp3 with one s/// call:
    #!/usr/bin/perl -w use strict; use File::Find; find(\&process, '/Users/Shared/Music'); sub process { my $song = $File::Find::name; my $orig = $song; if ($orig =~ s/ 1\.mp3$/.mp3/ && -e $orig) { ... } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://687318]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2014-04-17 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (439 votes), past polls