http://www.perlmonks.org?node_id=11102680

harangzsolt33 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I began learning Perl a few years ago as a hobby. I would like to write a script that compares two directories and prints the differences. My question is, do you know any existing Perl scripts that can do this?

More Details of how this would work:

$DIR1 will contain the full path to directory 1
$DIR2 will contain the full path to directory 2
$RECURSIVE=1 compares subdirectories too
$MODE=0 Prints a quick summary of both directories. Shows you the total size of both directories and the number of files/directories they contain.
$MODE=1 compares whether the same files exist in both places.
$MODE=2 Also compares file sizes and dates and tells you which file is younger/older or larger/smaller.
$MODE=3 Compares the MD5 hash of files that appear to be same. If the MD5 hash does not match, then those files are listed along with the hash values.
$MODE=4 If the two directories contain two files with the same name, same size, then the program will compare those files byte for byte and display the difference in detail for each file. This option could generate a very long report and may take a long time.

Replies are listed 'Best First'.
Re: perl script to compare two directories
by haukex (Archbishop) on Jul 11, 2019 at 14:36 UTC
Re: perl script to compare two directories
by GrandFather (Saint) on Jul 11, 2019 at 21:11 UTC

    How does "Give me code" eq "Learning Perl"? Write some code, try it out, find issues, clearly describe the issues here with runable code, then we can help you learn effectively.

    This option could generate a very long report and may take a long time.

    Why do you think that? Have you tried it? Was the code too slow? Yesterday I wrote a trivial script to crawl a directory structure and count lines of code in source files. It took a few minutes to write and well less than a second to run on a directory structure containing 12 MB of file in 60 folders and 520 files. When you think about the work going on under the hood to get that work done that is insanely fast.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      Yeah, probably a dir compare script isn't a very complex program, and I will be able to do it in a few hours ...BUT I remember asking earlier for perl scripts that others have made. I wanted to see what's out there, and what kind of free perl scripts I can use that others have written. And I was told that if I install Linux, it comes with a bunch of perl scripts as part of the installation. I don't use Linux, but maybe some of you do. And maybe if Linux comes with a free Perl script that compares directories, then someone will let me know. I just wanted to see the work others have done before I get to it myself. If a dir compare script is such a novel idea that no one has it, then I guess I will write one and share it here for free. Lol

        I'm sure its been done many times before in many different ways for many different reasons. I've even done something like that myself: Directory tree explorer with stats reporting

        Reading other people's code is often a good way of learning. But without guidance what you learn can often just be bad habits. Trying yourself then coming here for comment is likely to help you learn faster and better.

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        It was previously pointed out that using an abandoned version of a cut down build of perl, which hasn't had an update since 2003 isn't a great starting point. Strawberry perl was recommended. This comes with scripts as part of the installation, which is besides the point. You don't have to run Linux at all. Secondly a web search will find you many examples of why you're looking for. Besides being completely wrong in you assertions the last time this was discussed, you've needlessly over complicated the proposed specification here.

Re: perl script to compare two directories -- oneliner
by Discipulus (Canon) on Jul 12, 2019 at 10:16 UTC
    hello harangzsolt33,

    Ok.. this is partially a joke, sorry ;) but sometimes other tools are just around the corner:

    perl -e "system join ' ',qw(git diff --no-index Folder1 Folder2)" diff --git a/Folder1/NEW.txt b/Folder2/NEW.txt index 5d308e1..e69de29 100644 --- a/Folder1/NEW.txt +++ b/Folder2/NEW.txt @@ -1 +0,0 @@ -NEWTEXTINSERTED

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: perl script to compare two directories
by BillKSmith (Monsignor) on Jul 12, 2019 at 11:02 UTC
    Windows includes a file type similar to a symbolic link. You should specify the processing you expect for these files. I remember having some difficulty identifying these files with perl. (The status is not exactly what we would expect for a symbolic link.)
    Bill
Re: perl script to compare two directories
by marto (Cardinal) on Jul 11, 2019 at 15:32 UTC

    Finally accepting that using hashes is a sane approach.

        Perhaps, however that was a very specific use case, rather than a generic usage, also OP is using Windows, this is not an out of the box cross platform solution.

Re: perl script to compare two directories
by bliako (Monsignor) on Jul 12, 2019 at 09:47 UTC

    I guess deriving a sync/backup/restore recipe based on the output (the differences) must already exist. IMO visualising the differences and their locations is the big challenge - the more ways the better.

      rsync/robocopy achieve this. Various modules/apps on cpan exist to help.

Re: perl script to compare two directories
by harangzsolt33 (Chaplain) on Jul 13, 2019 at 06:13 UTC
    Okay, here is what I have done so far:

    #!/usr/bin/perl -w use strict; use warnings; # # This perl script compares two directories and prints # the differences. Actually, it compares DIR2 against DIR1 # and displays which files or directories do not exist # in DIR2... # ################################################################# my $DIR1 = 'C:\\BIN'; my $DIR2 = 'D:\\BIN'; my $RECURSIVE = 1; my $SHORTEN_LONG_PATH_NAMES = 1; ################################################################# my $CURRENT_DIRECTORY = GetPath($0); my $NUMBER_OF_MISSING_DIRS = 0; my $NUMBER_OF_MISSING_FILES = 0; my $NUMBER_OF_SIZE_DIFFERENCES = 0; print "\nDirectory Compare Script\nWritten by Zsolt in July 12, 2019. +<zsnp\@juno.com>\n"; $DIR1 = JoinPath( (isAbsPath($DIR1) ? '' : $CURRENT_DIRECTORY), $DIR1) +; $DIR2 = JoinPath( (isAbsPath($DIR2) ? '' : $CURRENT_DIRECTORY), $DIR2) +; $DIR1 = toNicePath($DIR1); $DIR2 = toNicePath($DIR2); print "\nDIR1: $DIR1\nDIR2: $DIR2\n"; CheckDIR($DIR1); print "\nNumber of size differences : $NUMBER_OF_SIZE_DIFFERENCES\nNum +ber of missing files : $NUMBER_OF_MISSING_FILES\nNumber of missing + dirs : $NUMBER_OF_MISSING_DIRS\nRECURSIVE = $RECURSIVE\n"; exit; ################################################################# Catc +hFile # # This function is automatically called by CheckDIR() every time # a file is found. This function gets the full name of the file. # # Usage: CatchFile(FULLNAME) <-- Called by CheckDIR() # sub CatchFile { my $FILE1 = shift; my $FILE2 = GetFile2($FILE1); unless (-e $FILE2) # File exists? { $NUMBER_OF_MISSING_FILES++; print 'XXXXX ' . Shorten($FILE2) . " file does not exist!\n"; return; } return if (-d $FILE1); # Directory? -Ignore it. if (-s $FILE1 != -s $FILE2) # Same size? { $NUMBER_OF_SIZE_DIFFERENCES++; print '<<<<< ' . Shorten((-s $FILE1 < -s $FILE2) ? $FILE1 : $FILE2 +) . " is smaller!\n" } # if (-M $FILE1 != -M $FILE2) # Same date? # { # print '::::: ' . Shorten((-M $FILE1 > -M $FILE2) ? $FILE1 : $FILE +2) . " is older!\n" # } } ################################################################# Chec +kDIR # # This function reads the contents of a folder and calls # CatchFile() for each file that was found. # Returns 0 on success. # Returns a negative value if an error occurred. # # Usage: CheckDIR(PATH) # sub CheckDIR { @_ or return; my $PATH = shift; defined $PATH or return; length($PATH) or return; print ' Reading directory: ' . Shorten($PATH) . "\n"; $PATH = toNicePath(AddSuffix($PATH, '\\')); my $FULLNAME; opendir(my $DIR, $PATH) or return; while (my $NAME = readdir $DIR) { $FULLNAME = "$PATH$NAME"; if (-d($FULLNAME)) { unless (-d(GetFile2($FULLNAME))) { $NUMBER_OF_MISSING_DIRS++; print "!!!!! " . Shorten(GetFile2($FULLNAME)) . " dir doesn't +exist!\n"; } # Check into subdirectory if RECURSIVE == 1 # Skip directory if its name starts with "." if ($RECURSIVE) { CheckDIR($FULLNAME) unless (vec($NAME, 0, 8) == 46); } next; } CatchFile($FULLNAME); } closedir $DIR; } ################################################################# Shor +ten # # This function shortens a string if it's too long by # cutting out the middle part and substituting "**" # in its place. # # Usage: STRING = Shorten(STRING, [MAXLEN]) # sub Shorten { @_ or return ''; my $S = shift; defined $S or return ''; my $L = length($S); $L or return ''; $SHORTEN_LONG_PATH_NAMES or return $S; my $M = @_ ? shift : 50; my $H = $M >> 1; $L > $M or return $S; return substr($S, 0, $H) . '**' . substr($S, $L - $H, $H); } ################################################################# GetF +ile2 # # This function takes File1 and returns File2. # File1 is the full path of the file or directory in $DIR1 # File2 is the full path of the file or directory in $DIR2 # sub GetFile2 { my $FILE1 = shift; my $FILE = substr($FILE1, length($DIR1), length($FILE1)); my $FILE2 = $DIR2 . $FILE; return $FILE2; } ################################################################# isAb +sPath # # v2019.7.12 INTEGER = isAbsPath(PATH) # Returns 1 if PATH starts with a drive letter # or // or \\ otherwise returns zero. sub isAbsPath { @_ or return 0; my $P = shift; defined $P or return 0; length($P) or return 0; $P = uc(substr(Trim($P), 0, 3)); my $c = vec($P, 0, 8); return 1 if ($c == 47 || $c == 92); return 0 if ($c < 65 || $c > 90); return 0 if (vec($P, 1, 8) != 58); $c = vec($P, 2, 8); return 1 if ($c == 47 || $c == 92); return 0; } ################################################################# Trim # # v2019.6.15 STRING = Trim(STRING) # Removes whitespace before and after STRING. # Treats tabs, esc, null, vertical tab, # and new lines as whitespace. # sub Trim { @_ or return ''; my $S = shift; defined $S or return ''; my $L = length($S); $L or return ''; my $START = 0; my $LAST = 0; while ($L--) { if (vec($S, $L, 8) > 32) { $START = $L; $LAST or $LAST = $L + 1; } } return substr($S, $START, $LAST - $START); } ################################################################# GetP +ath # # This function returns the path portion of a full file name # without the trailing / or \ character. # # Usage: PATH = GetPath(FULL_FILE_NAME) # # Example: GetPath($0) --> returns this perl script's path # sub GetPath { my $F = shift; $F =~ tr#\\#/#; my $P = rindex($F, '/'); $P or return substr($F, 0, 1); return ($P > 0) ? substr($F, 0, $P) : '.'; } ################################################################# toNi +cePath # # Returns a PATH string that is separated by either \\ or / # depending on the current OS (either Linux or Windows). # # Usage: STRING = toNicePath(STRING) # sub toNicePath { @_ or return ''; my $P = shift; defined $P or return ''; length($P) or return ''; if (index(uc($^O), 'MSWIN') < 0) { $P =~ tr#\\#/#; } else { $P =~ tr#/#\\#; } return $P; } ################################################################# Join +Path # # v2019.6.16 STRING = JoinPath(STRING, [STRING], [STRING]) # This function joins two names into a single path by # adding / in between the names. It also simplifies the # resulting path by removing repeated \\ // characters, # and tries to resolve the "." and ".." in a path name # to literal names only. # sub JoinPath { @_ or return ''; my $P = join('/', CollapseArray(@_)); defined $P or return ''; length($P) or return ''; $P = Trim($P); $P =~ tr#\\#/#; if (uc(substr($P, 0, 8)) eq 'FILE:///') { $P = substr($P, 8, length($P)); } $P =~ s|///|/|g; $P =~ s|//|/|g; my $DRIVE = (vec($P, 1, 8) == 58) ? vec($P, 0, 8) & 223 : 0; if ($DRIVE) { $P = substr($P, 2, length($P)); } my $SLASH = (vec($P, 0, 8) == 47) ? 47 : 0; if ($SLASH) { $P = substr($P, 1, length($P)); } my @A = split('/', $P); for (my $i = 0; $i < @A; $i++) { if ($A[$i] eq '.') { splice(@A, $i--, 1); } if ($A[$i] eq '..') { if ($i > 0) { splice(@A, --$i, 2); $i--; } else { splice(@A, $i, 1); $i--; } } } return ($DRIVE ? chr($DRIVE) . ':' : '') . ($SLASH ? '/' : '') . joi +n('/', @A); } ################################################################# Coll +apseArray # # v2019.6.15 NEW_ARRAY = CollapseArray(ARRAY) # This function removes blank lines from an array. # sub CollapseArray { my $i = @_; while ($i--) { splice(@_, $i, 1) unless (length($_[$i])); } return @_; } ################################################################# AddS +uffix # # This function makes sure that STRING ends with SUFFIX. # # Usage: STRING = AddSuffix(STRING, SUFFIX) # # Example: AddSuffix('Abcdef', 'def') ---> 'Abcdef' # AddSuffix('Abcdef', 'DEF') ---> 'AbcdefDEF' # sub AddSuffix { @_ or return ''; my $S = shift; defined $S or return ''; my $LS = le +ngth($S); $LS or return ''; @_ or return $S; my $X = shift; defined $X or return $S; my $LX = le +ngth($X); $LX or return $S; if ($LS >= $LX) { if (substr($S, $LS - $LX, $LX) eq $X) { return $S; + } } return $S . $X; } #################################################################

      I don't have enough time to go through all of the code right now, but I do see quite a few reinvented wheels, as has been discussed before, for example, I've pointed you to File::Spec before. Reinventing wheels gives you more code to maintain, more code to test, and more chances to introduce bugs. For example, toNicePath is incorrect: a backslash \ is a valid character in *NIX filenames. Getting portability right is tricky, which is why modules like File::Spec exist in the first place.

      I strongly recommend you take the time to study what modules are available to you, especially what core modules, as those are (almost) always installed.

        I've pointed you to File::Spec before. Reinventing wheels gives you more code to maintain, more code to test, and more chances to introduce bugs.

        I fully agree to use modules instead of re-inventing the wheel, poorly.

        BUT: File::Spec is one of the modules that have a conceptual error, probably since the first version. See:

        Even with that error, it is better than re-inventing the wheel.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks for posting. I have two minor suggestions, if you are interested at all.

      sub GetFile2 ... sub isAbsPath

      If you don't want your users/co-developers to pull their hair out and curse your name into the void for eternity, don't arbitrarily capitalise the first letter of your sub names. Pick a format (any format) and stick to it.

      sub isAbsPath { @_ or return 0; my $P = shift; defined $P or return 0; length($P) or return 0; $P = uc(substr(Trim($P), 0, 3)); my $c = vec($P, 0, 8); return 1 if ($c == 47 || $c == 92); return 0 if ($c < 65 || $c > 90); return 0 if (vec($P, 1, 8) != 58); $c = vec($P, 2, 8); return 1 if ($c == 47 || $c == 92); return 0; }

      Did you know that perl has regular expressions? That will save all this hard-coding of ord values, etc. eg:

      sub isAbsPath { my $path = shift; return 0 unless defined $path; return $path =~ /^([A-Z]:|[\/\\])/ ? 1 : 0; }

      You could omit the ternary if you're only interested in a true/false return value. I've only demonstrated this one subroutine but many others could benefit from such simplification.

      The alternative is to use a module. For this case, eg is_absolute from Path::Tiny.

      The more you code in Perl, the more Perlish your code will become. Enjoy the journey.

        Oh, thank you for the suggestions!!

        "sub GetFile2 ... sub isAbsPath
        If you don't want your users/co-developers to pull their hair out..."

        LOL Okay.