Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Determine files added/removed (code for magazine article)

by McMahon (Chaplain)
on Apr 15, 2004 at 19:35 UTC ( [id://345514]=CUFP: print w/replies, xml ) Need Help??

I am writing an article for a software magazine, and the article will be accompanied by some simple Perl scripts. I asked in SoPW for code reviews.

The original node is here.

A number of monks gave me excellent advice, some of which I didn't follow, but all of which I really appreciate!.

(And of course, feel free to keep reviewing, if you see something that offends your sensibilities...)

My revised code follows:

#!/usr/bin/perl #InventoryReport.pl use warnings; use strict; use File::Find; open (OUT, ">InventoryFile.txt") or die "Couldn't open output file\n"; my $directory = '//InstallBox/C'; find (\&wanted, $directory); sub wanted { print OUT "$File::Find::name"; } # # Lines 4 and 5 are good Perl practice. "strict" and "warnings" are p +ragmas # that help prevent Perl programmers from doing silly things in their +programs. # # Line 6 is Perl's way of invoking a "module". Modules are sophisticat +ed chunks # of code with simple interfaces for use by Perl programmers. I don't +have to # write my own recursive filesystem prober, I can just use File::Find +and let # it do the work for me. # # Line 8 "open" statement with the ">" purges any contents if the file + exists. # If there's a problem, the program will "die" with an error msg. # # Edit line 11 so that "my $directory" points to your own top-level di +rectory. # I've used Windows syntax here, but the script works just fine on oth +er OSs. # # Lines 13-16 are the basic implementation of File::Find. # Do "perldoc File::Find" for details. # ######################################################
#!/usr/bin/perl #CompareReport.pl use warnings; use strict; use List::Compare; open (OUT, ">CDiffs.txt") or die "Couldn't open output diffs file\n"; open (FILE1, "OldInventory.txt") or die "Couldn't open old inventory file\n"; open (FILE2, "NewInventory.txt") or die "Couldn't open new inventory file\n"; my @file1 = <FILE1>; $_ = lc for @file1; my @file2 = <FILE2>; $_ = lc for @file2; my $lc = List::Compare->new(\@file1, \@file2); my @file1only = $lc->get_Lonly; my @file2only = $lc->get_Ronly; print OUT "For unique files in OldInventory.txt:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n"; print OUT "IGNORING FILES IN THE RECYCLE BIN.\n"; print OUT "IGNORING ALL HISTORY FILES.\n"; print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; print OUT "\n"; foreach my $file1(@file1only) { unless (($file1 =~ "tmp") or ($file1=~"temp") or ($file1 =~ "recycler") or ($file1 =~ "history")) { print OUT $file1; } } print OUT "\n\n"; print OUT "For unique files in NewInventory.txt\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n"; print OUT "IGNORING FILES IN THE RECYCLE BIN.\n"; print OUT "IGNORING ALL HISTORY FILES.\n"; print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; print OUT "\n"; foreach my $file2(@file2only) { unless (($file2 =~ "tmp") or ($file2=~"temp") or ($file2 =~ "recycler") or ($file2 =~ "history")) { print OUT $file2; } } # # Lines 4 and 5 are good Perl practice. "strict" and "warnings" are p +ragmas # that help prevent Perl programmers from doing silly things in their +programs. # # Line 6 is Perl's way of invoking a "module". Modules are sophisticat +ed chunks # of code with simple interfaces for use by Perl programmers. # # Line 8 "open" statement with the ">" purges any contents if the file + exists. # If there's a problem, the program will "die" with an error msg. The +other # "open" statements are for input, and are not destructive. # Note that this script assumes that the input files are in the direct +ory from # which the script is run. # # Line 15 creates an array "@file1" where each element of the array is + a line of # the file pointed to by "FILE1". # Line 17 does the same thing for FILE2 into the array @file2. # # Lines 16 and 18 are somewhat idiomatic Perl. "lc" is the lowercase +function. # These statementslowercase each element of each array by iterating ov +er each # array using a "for" loop. # # Line 20 cranks up List::Compare. See the documentation on CPAN for +full # details. # # Lines 22 and 23 implement the List::Compare functions that we care a +bout. # "get_Lonly" shows each item unique to the left array, @file1. Since + @file1 # is our old inventory, the array @file1only will contain all of the f +ilenames # left behind from the old inventory. @file2only will contain all of t +he files # added to the new inventory. # # Lines 25-30 are printed to the output file so the user knows what's +going on. # # Lines 32-39 check each element of the @file1only array. "unless" is + a nifty # feature of Perl. "unless" translates as "if not". And the "=~" is t +he Perl # pattern match operator. So these lines could be translated as "Check + each item # in the array. If the item does *not* contain any of these text stri +ngs, print # the item to the output file". # Also note a potential bug here: "temp" will match "Temporary" and "T +emp", but # will also match "ExemptEmployees" and "New Attempt", etc. thus preve +nting them # from being reported. Perl gives you lots of ways to fix the problem +, I'll # leave it to you scripters to decide what needs to be done here! # # Line 39 prints a couple of newlines to separate the sections of the +report. # # Lines 41-54 do the same work for the @file2only array, reporting on +elements # unique to the new inventory. This repeated work should really be ref +actored # as a subroutine. It's not hard, but I didn't want to have to introd +uce that # syntax as well. So I encourage you scripters to make my code better + by using # a subroutine for this repeated work! #

Replies are listed 'Best First'.
Re: Reviewed code reviewed again
by particle (Vicar) on Apr 15, 2004 at 21:37 UTC

    ouch, those comments hurt my eyes! consider using the __END__ marker and you can follow with all the free text you want.

    # Lines 4 and 5 are good Perl practice. "strict" and "warnings" are p +ragmas # that help prevent Perl programmers from doing silly things in their +programs. #

    becomes

    __END__ Lines 4 and 5 are good Perl practice. "strict" and "warnings" are pra +gmas that help prevent Perl programmers from doing silly things in their pr +ograms. <code> <p>also, you might consider using here documents rather than multiple +print statements.</p> <code> print OUT "For unique files in OldInventory.txt:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME.\n"; print OUT "IGNORING FILES IN THE RECYCLE BIN.\n"; print OUT "IGNORING ALL HISTORY FILES.\n"; print OUT "LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING.\n"; print OUT "\n";

    becomes clearer, and more maintainable as

    print OUT <<EOT; For unique files in OldInventory.txt: IGNORING FILES WITH Tmp OR Temp IN PATHNAME. IGNORING FILES IN THE RECYCLE BIN. IGNORING ALL HISTORY FILES. LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING. EOT

    it's one more useful concept to teach a user, and not difficult to understand.

    ~Particle *accelerates*

      Call me crazy, but i would rather use a simple:

      print OUT "For unique files in OldInventory.txt: IGNORING FILES WITH Tmp OR Temp IN PATHNAME. IGNORING FILES IN THE RECYCLE BIN. IGNORING ALL HISTORY FILES. LOWERCASING ALL FILE AND PATHNAMES BEFORE COMPARING. ";
      for something as simple as that. But i would rather maintain your example than multiple print lines any day of the week. ;)

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        Thanks jeffa, and you're right.

        The production version(s) of this code have *many* more rules and are much more complicated. Also, I frequently get requests to modify the exclusion rules.

        I hope that my audience actually uses this code-- in which case they will want to experiment with their own regexes and I did want to make it easy to hack the output, too.

        Thanks for the comment!

      Please don't consider using m!^__(?:END|DATA)__$! for documentation purposes. POD I tell you, POD. That's what it is for: Plain Old Documentation! Throwing together information at the end of the script is just as bad, if not worse, than using comments. And comments in gross (pun intended) quantities are horrendous. Besides, typing perldoc ScriptName.pl is a heck of a lot easier than scrounging through a file looking for comments or documentation stuffed in in a DATA clause.

Re: Determine files added/removed (code for magazine article)
by tilly (Archbishop) on Apr 16, 2004 at 05:24 UTC
    You added error checks, good. You didn't do it as suggested in perlstyle, your error checks don't include the information in $!.

    My advice about unless got ignored. Its value isn't obvious until you've personally been through a couple of debugging sessions. But I highly recommend taking it.

    Update: After reading the response, I understand the reasons for omitting $!. The unless issue, I once had the same opinion. People in your audience might well like it. It certainly can make the code scan well, but debugging is a different story.

      tilly, I believe my audience for this article will be both easily frightened by Script Magic and also impressed by little tricks. The more the code looks like English, the happier my editor will be.

      I didn't use $! because I didn't want to have to explain that particular piece of magic.

      I kept "unless" because it really is a neat feature of Perl, and it made the code more readable. (Or at least I think so.) Note that I translate the "unless" statement in the comments at the bottom.

      And again, thank you *very* much for your help.
Re: Determine files added/removed (code for magazine article)
by fluxion (Monk) on Apr 23, 2004 at 22:16 UTC
    foreach my $file2(@file2only) { unless (($file2 =~ "tmp") or ($file2=~"temp") or ($file2 =~ "recycler") or ($file2 =~ "history")) { print OUT $file2; } }

    I still see a problem pointed out in the original thread with your use of regular expressions to disclude certain files. $file2 =~ "history" and $file2 =~ "temp" for example, matches any file containing the word "history" or "temp" in part of it's path. This could very well include critical data like "account_history", or "form_templates". I'm not sure what the output per file looks like as the block loops through @file2only, but if it's a full directory path you could do something like:

    c:\\account_history\important_variations.txt

    foreach my $file2 (@file2only) { my $flag = undef; my @dir_struct = split('\',$file2); foreach my $item (@dir_struct) { $flag = 'skip' if ($item =~ /^(temp|recycle|history)$/); last if ($flag eq 'skip'); } print OUT $file2 unless ($flag eq 'skip'); }

    I know you're going for simplicity here, and this adds alot to explain, but this is just a quick, plausible solution to show how the code can be made more "production-ready". I haven't tested the above btw, and split wont split the beginning part of the path very cleanly, but there's nothing relevant there anyway as far as the match is concerned.

    Roses are red, violets are blue. All my base, are belong to you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://345514]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-24 22:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found