Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: perl script to compare two directories

by harangzsolt33 (Pilgrim)
on Jul 13, 2019 at 06:13 UTC ( #11102774=note: print w/replies, xml ) Need Help??


in reply to perl script to compare two directories

Okay, here is what I have done so far:

#!/usr/bin/perl -w use strict; use warnings; # # This perl script compares two directories and prints # the differences. Actually, it compares DIR2 against DIR1 # and displays which files or directories do not exist # in DIR2... # ################################################################# my $DIR1 = 'C:\\BIN'; my $DIR2 = 'D:\\BIN'; my $RECURSIVE = 1; my $SHORTEN_LONG_PATH_NAMES = 1; ################################################################# my $CURRENT_DIRECTORY = GetPath($0); my $NUMBER_OF_MISSING_DIRS = 0; my $NUMBER_OF_MISSING_FILES = 0; my $NUMBER_OF_SIZE_DIFFERENCES = 0; print "\nDirectory Compare Script\nWritten by Zsolt in July 12, 2019. +<zsnp\@juno.com>\n"; $DIR1 = JoinPath( (isAbsPath($DIR1) ? '' : $CURRENT_DIRECTORY), $DIR1) +; $DIR2 = JoinPath( (isAbsPath($DIR2) ? '' : $CURRENT_DIRECTORY), $DIR2) +; $DIR1 = toNicePath($DIR1); $DIR2 = toNicePath($DIR2); print "\nDIR1: $DIR1\nDIR2: $DIR2\n"; CheckDIR($DIR1); print "\nNumber of size differences : $NUMBER_OF_SIZE_DIFFERENCES\nNum +ber of missing files : $NUMBER_OF_MISSING_FILES\nNumber of missing + dirs : $NUMBER_OF_MISSING_DIRS\nRECURSIVE = $RECURSIVE\n"; exit; ################################################################# Catc +hFile # # This function is automatically called by CheckDIR() every time # a file is found. This function gets the full name of the file. # # Usage: CatchFile(FULLNAME) <-- Called by CheckDIR() # sub CatchFile { my $FILE1 = shift; my $FILE2 = GetFile2($FILE1); unless (-e $FILE2) # File exists? { $NUMBER_OF_MISSING_FILES++; print 'XXXXX ' . Shorten($FILE2) . " file does not exist!\n"; return; } return if (-d $FILE1); # Directory? -Ignore it. if (-s $FILE1 != -s $FILE2) # Same size? { $NUMBER_OF_SIZE_DIFFERENCES++; print '<<<<< ' . Shorten((-s $FILE1 < -s $FILE2) ? $FILE1 : $FILE2 +) . " is smaller!\n" } # if (-M $FILE1 != -M $FILE2) # Same date? # { # print '::::: ' . Shorten((-M $FILE1 > -M $FILE2) ? $FILE1 : $FILE +2) . " is older!\n" # } } ################################################################# Chec +kDIR # # This function reads the contents of a folder and calls # CatchFile() for each file that was found. # Returns 0 on success. # Returns a negative value if an error occurred. # # Usage: CheckDIR(PATH) # sub CheckDIR { @_ or return; my $PATH = shift; defined $PATH or return; length($PATH) or return; print ' Reading directory: ' . Shorten($PATH) . "\n"; $PATH = toNicePath(AddSuffix($PATH, '\\')); my $FULLNAME; opendir(my $DIR, $PATH) or return; while (my $NAME = readdir $DIR) { $FULLNAME = "$PATH$NAME"; if (-d($FULLNAME)) { unless (-d(GetFile2($FULLNAME))) { $NUMBER_OF_MISSING_DIRS++; print "!!!!! " . Shorten(GetFile2($FULLNAME)) . " dir doesn't +exist!\n"; } # Check into subdirectory if RECURSIVE == 1 # Skip directory if its name starts with "." if ($RECURSIVE) { CheckDIR($FULLNAME) unless (vec($NAME, 0, 8) == 46); } next; } CatchFile($FULLNAME); } closedir $DIR; } ################################################################# Shor +ten # # This function shortens a string if it's too long by # cutting out the middle part and substituting "**" # in its place. # # Usage: STRING = Shorten(STRING, [MAXLEN]) # sub Shorten { @_ or return ''; my $S = shift; defined $S or return ''; my $L = length($S); $L or return ''; $SHORTEN_LONG_PATH_NAMES or return $S; my $M = @_ ? shift : 50; my $H = $M >> 1; $L > $M or return $S; return substr($S, 0, $H) . '**' . substr($S, $L - $H, $H); } ################################################################# GetF +ile2 # # This function takes File1 and returns File2. # File1 is the full path of the file or directory in $DIR1 # File2 is the full path of the file or directory in $DIR2 # sub GetFile2 { my $FILE1 = shift; my $FILE = substr($FILE1, length($DIR1), length($FILE1)); my $FILE2 = $DIR2 . $FILE; return $FILE2; } ################################################################# isAb +sPath # # v2019.7.12 INTEGER = isAbsPath(PATH) # Returns 1 if PATH starts with a drive letter # or // or \\ otherwise returns zero. sub isAbsPath { @_ or return 0; my $P = shift; defined $P or return 0; length($P) or return 0; $P = uc(substr(Trim($P), 0, 3)); my $c = vec($P, 0, 8); return 1 if ($c == 47 || $c == 92); return 0 if ($c < 65 || $c > 90); return 0 if (vec($P, 1, 8) != 58); $c = vec($P, 2, 8); return 1 if ($c == 47 || $c == 92); return 0; } ################################################################# Trim # # v2019.6.15 STRING = Trim(STRING) # Removes whitespace before and after STRING. # Treats tabs, esc, null, vertical tab, # and new lines as whitespace. # sub Trim { @_ or return ''; my $S = shift; defined $S or return ''; my $L = length($S); $L or return ''; my $START = 0; my $LAST = 0; while ($L--) { if (vec($S, $L, 8) > 32) { $START = $L; $LAST or $LAST = $L + 1; } } return substr($S, $START, $LAST - $START); } ################################################################# GetP +ath # # This function returns the path portion of a full file name # without the trailing / or \ character. # # Usage: PATH = GetPath(FULL_FILE_NAME) # # Example: GetPath($0) --> returns this perl script's path # sub GetPath { my $F = shift; $F =~ tr#\\#/#; my $P = rindex($F, '/'); $P or return substr($F, 0, 1); return ($P > 0) ? substr($F, 0, $P) : '.'; } ################################################################# toNi +cePath # # Returns a PATH string that is separated by either \\ or / # depending on the current OS (either Linux or Windows). # # Usage: STRING = toNicePath(STRING) # sub toNicePath { @_ or return ''; my $P = shift; defined $P or return ''; length($P) or return ''; if (index(uc($^O), 'MSWIN') < 0) { $P =~ tr#\\#/#; } else { $P =~ tr#/#\\#; } return $P; } ################################################################# Join +Path # # v2019.6.16 STRING = JoinPath(STRING, [STRING], [STRING]) # This function joins two names into a single path by # adding / in between the names. It also simplifies the # resulting path by removing repeated \\ // characters, # and tries to resolve the "." and ".." in a path name # to literal names only. # sub JoinPath { @_ or return ''; my $P = join('/', CollapseArray(@_)); defined $P or return ''; length($P) or return ''; $P = Trim($P); $P =~ tr#\\#/#; if (uc(substr($P, 0, 8)) eq 'FILE:///') { $P = substr($P, 8, length($P)); } $P =~ s|///|/|g; $P =~ s|//|/|g; my $DRIVE = (vec($P, 1, 8) == 58) ? vec($P, 0, 8) & 223 : 0; if ($DRIVE) { $P = substr($P, 2, length($P)); } my $SLASH = (vec($P, 0, 8) == 47) ? 47 : 0; if ($SLASH) { $P = substr($P, 1, length($P)); } my @A = split('/', $P); for (my $i = 0; $i < @A; $i++) { if ($A[$i] eq '.') { splice(@A, $i--, 1); } if ($A[$i] eq '..') { if ($i > 0) { splice(@A, --$i, 2); $i--; } else { splice(@A, $i, 1); $i--; } } } return ($DRIVE ? chr($DRIVE) . ':' : '') . ($SLASH ? '/' : '') . joi +n('/', @A); } ################################################################# Coll +apseArray # # v2019.6.15 NEW_ARRAY = CollapseArray(ARRAY) # This function removes blank lines from an array. # sub CollapseArray { my $i = @_; while ($i--) { splice(@_, $i, 1) unless (length($_[$i])); } return @_; } ################################################################# AddS +uffix # # This function makes sure that STRING ends with SUFFIX. # # Usage: STRING = AddSuffix(STRING, SUFFIX) # # Example: AddSuffix('Abcdef', 'def') ---> 'Abcdef' # AddSuffix('Abcdef', 'DEF') ---> 'AbcdefDEF' # sub AddSuffix { @_ or return ''; my $S = shift; defined $S or return ''; my $LS = le +ngth($S); $LS or return ''; @_ or return $S; my $X = shift; defined $X or return $S; my $LX = le +ngth($X); $LX or return $S; if ($LS >= $LX) { if (substr($S, $LS - $LX, $LX) eq $X) { return $S; + } } return $S . $X; } #################################################################

Replies are listed 'Best First'.
Re^2: perl script to compare two directories
by haukex (Chancellor) on Jul 13, 2019 at 09:38 UTC

    I don't have enough time to go through all of the code right now, but I do see quite a few reinvented wheels, as has been discussed before, for example, I've pointed you to File::Spec before. Reinventing wheels gives you more code to maintain, more code to test, and more chances to introduce bugs. For example, toNicePath is incorrect: a backslash \ is a valid character in *NIX filenames. Getting portability right is tricky, which is why modules like File::Spec exist in the first place.

    I strongly recommend you take the time to study what modules are available to you, especially what core modules, as those are (almost) always installed.

      I've pointed you to File::Spec before. Reinventing wheels gives you more code to maintain, more code to test, and more chances to introduce bugs.

      I fully agree to use modules instead of re-inventing the wheel, poorly.

      BUT: File::Spec is one of the modules that have a conceptual error, probably since the first version. See:

      Even with that error, it is better than re-inventing the wheel.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re^2: perl script to compare two directories
by hippo (Chancellor) on Jul 13, 2019 at 09:07 UTC

    Thanks for posting. I have two minor suggestions, if you are interested at all.

    sub GetFile2 ... sub isAbsPath

    If you don't want your users/co-developers to pull their hair out and curse your name into the void for eternity, don't arbitrarily capitalise the first letter of your sub names. Pick a format (any format) and stick to it.

    sub isAbsPath { @_ or return 0; my $P = shift; defined $P or return 0; length($P) or return 0; $P = uc(substr(Trim($P), 0, 3)); my $c = vec($P, 0, 8); return 1 if ($c == 47 || $c == 92); return 0 if ($c < 65 || $c > 90); return 0 if (vec($P, 1, 8) != 58); $c = vec($P, 2, 8); return 1 if ($c == 47 || $c == 92); return 0; }

    Did you know that perl has regular expressions? That will save all this hard-coding of ord values, etc. eg:

    sub isAbsPath { my $path = shift; return 0 unless defined $path; return $path =~ /^([A-Z]:|[\/\\])/ ? 1 : 0; }

    You could omit the ternary if you're only interested in a true/false return value. I've only demonstrated this one subroutine but many others could benefit from such simplification.

    The alternative is to use a module. For this case, eg is_absolute from Path::Tiny.

    The more you code in Perl, the more Perlish your code will become. Enjoy the journey.

      Oh, thank you for the suggestions!!

      "sub GetFile2 ... sub isAbsPath
      If you don't want your users/co-developers to pull their hair out..."

      LOL Okay.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11102774]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2019-10-22 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?