Parallel maintenance on many projects, part Iby brian_d_foy (Abbot)
|on Sep 01, 2004 at 22:04 UTC||Need Help??|
I have a medium-sized directory where I store all my CVS working copies. Everything is in one place, even though they might come from different servers (or even different source control products). I want to do a lot of parallel maintenance in these directories, and as I go about this, I am going to write this article in parallel too. I won't skimp on the details, or hide the stupid mistakes I make. You get to see how I actually do something, including the false starts, momentary digressions, and V-8 moments ("D'oh! I shoulda used a command line!").
I start in my Dev directory, which looks like it should mean "device" but really means "development". If the computer and I agree to stay out of each other's file systems I don't think this will a problem.
In that list are new projects, like "Palm", and projects I haven't bothered to look at in a couple years, like "CGI_MetaFAQ" and whatever I might have in "Devel" (as I write this, I don't remember what that is).
I have two things I need to do, and would normally be manageable if I didn't have my finger in some many things. Most of these directories look like Perl distributions, and I want to check each distribution against a list of things I think should be true (e.g. CVS is up to date, has a README, has a META.yml, has a pod.t, and so on).
The other task involves cleaning up the CVS/ROOT files. SourceForge used to give out CVS server names like cvs.brian-d-foy.sourceforge.net because it could do wildcard sub-domains. When they upgraded BIND, they lost that feature. However, I'm stuck with a bunch of CVS/ROOT files that have the old host names.
I can't successfully run cvs update in those directories since they have no valid host to connect to.
I'm going to handle the CVS/ROOT problem first, because it should be easier, and once I fix it, I shouldn't have to do it again. Even before I start I predict that 80% of the work is already done, meaning that 80% of the directories are up-to-date in the CVS repository, so I only need to delete my working copy and check out the current HEAD. That updates the CVS/ROOT files automatically. How do I find out which directories qualify?
I reflexively pull out Perl and type "use File::Find", but even though that might be faster than find(1), I have a small corpus and programmer time is more important. If find(1) takes twice as long, I don't care. I also make a note that I need to start my new File::Find::Functions project: collect subroutines that people can shove right into File::Find::find.
How many directories do I need to look at? I use find(1) to list them.
Huh? What's up with that? I should have a bunch of those files. I was expecting most of those to be CVS working copies. I look in some of the other directories. Most of the time the file is actually "CVS/Root". I probably knew that.
I modify my find(1) and count the lines of output.
So how many do I need to fix? A good CVS/Root file should have "cvs.sourceforge.net" as the host. Hosts like cvs.brian-d-foy.sourceforge.net and cvs.perl-rss.sourceforge.net are stale. I take the output from find(1) and pipe it to xargs, where I simply cat(1) the file then use grep(1)'s -v switch to match things that don't match "cvs.sourceforge". Before I do that, I realize I have another candidate for Randal Schwartz's "Useless use of cat" Award. I can give grep(1) the filenames directory, and to much better advantage: grep(1) will prepend the file name to the output, like in this excerpted output.
Now I want to a list of those directories. My first try doesn't work.
What the heck is this? Who's the "single ref constructor? I try it without the substitution.
Why is $F an anonymous array? Am I going crazy? I look at all of @F. It certainly looks like a normal array. I go back to the first place I learned about -F: Randal's first Unix Review article.
I go back a step, and print out the dereference first element in @F. It's even odder.
Have you spotted the error yet? I should have, because I make it often enough, and it's not a Perl error. That 0 is very telling. Compare the previous versions with the correctly working version. Now I'm sure that I'm a bonehead. For bonus points, figure out where the anonymous array came from (now I know). The excerpted output shows just the directory names.
Now I want to know which of those directories need to check changes into CVS. Those are going to be the ones that need attention. Rather than continue this already too long command line, I redirect that output into a file, but before I do that I modify the command line to remove the trailing /CVS/ROOT from the directories. I end up with a list of directories I want to run cvs update in.
For the next step I want to change into each of those directories and run cvs update. I need to look at the output. I realize that the network traffic is going to take a while, so I should minimize it by looking at the highest directory possible in each working copy. I then realize if I wanted to do that, I could make my list with ls -1. Oh well. Had I done that I wouldn't have reminded myself about proper shell syntax. So let's give it a go, but first, I need to go to the kitchen to get some water.
My first pass is very simple. I figure I will use this program again, perhaps as a nightly check to see what I've forgotten to check in.
I don't let myself use all the quick and dirty tricks, and I throw in File::Spec::Functions with an eye toward portability.
Well, it only 20% works. Remember all those bad CVS/ROOT files? I never fixed them, although I am changing into each directory to check cvs. I get a lot of bad host errors because I still have the problem I started with. I need to fix that first. Remember when I went to the kitchen to get water? When I came back I got a bit ahead of myself. Taking the break got me out of the flow.
First (again), I need a list of all the bad CVS/ROOT files. In my last command line, I stripped off the CVS/ROOT portion. Now I remember why they were there. No worries, though. I just go back a bit in the shell's history and try again.
I modified my previous program to go through the files for me, and I give Ingy's IO::All module a spin so I don't have to worry about open() and friends (and IO::All is just cool). I realized once I ran this program that it's really just another perl command line if I use in-place editing, but I'm a bit leary of really screwing up, so I mollify myself with my appropriate caution. I first print what I want to put back into the files before I actually do it, and I save the original file contents in cvs-root-originals.txt. If I mess up I can recreate the files, at least. Check out that spiffy IO::All append mode.
To do the real thing, I uncomment the line to write the information back to the file. After I really run it, I look at one of the files to ensure it worked, but I get that little shot of adrenaline when I think I've really screwed up. By now you might have the idea that I'm as bad a coder as anyone else. I think you would be generous saying that. Remember, an expert is someone who has made every mistake.
Even with the momentary terror, I've made enough stupid mistakes that I know not to panic right away. It turns out I'm just in the wrong directory.
Now I go back to my program to check the state of CVS. I don't get any unsuccessful connection attempts this time. I need to determine when the CVS output is interesting (i.e. I need to deal with changes), or when I can ignore it.
In some cases, I just get progress output, and I don't need to do anything.
In a lot of cases, I need to do something. While my script is running, I can go through the output and start fixing things. This script is a bit rough, and it's checking a lot of directories more than once. When it checks Business/ISBN, for instance, it also checks Business/ISBN/t because CVS descends into sub-directories. Indeed, this all starts as Business, which is the top directory in this tree.
This gets back to the first task. Remember way back at the beginning when I said I had two things to do. Now I'm on the first one: ensuring things are as they should be. Before I start mucking around in the directories I want to ensure I'm working with the up-to-date working copies.
I need to modify my program to check only the top level directory. I add a %Seen hash to watch which directories I check. If I run into a directory I have already checked, I skip to the next one. The speed up is very noticeable. I make about 50 network connections instead of 300.
And, since I'm trying to cut out as much uninteresting output as possible, I want to output nothing if the working copy is up-to-date. In my release(1) program, I have code that already does this. Stolen directly from the release(1) source (I could use Module::Release which made this a function, too, but I'm going to change a lot of it), I add a parse_cvs() subroutine to my program.
This gets me a very nice report: 279 lines of output (with plenty of whitespace and blank lines). Most of these seem to be new pod.t files. I vaguely remember writing something to replace all the pod.t files with the latest interface (and maybe I should write it again since Andy recently updated it to use taint checking).
Okay, that's enough for me to work on for now. It's a good first step to getting my act together and doing a lot of needed maintenance on this stuff. But it's time to take a break.
Before I stop writing, though, I have a few ideas on what's next: how about a report that pulls in stuff from RT too? And automatically running all the tests in all the projects? Next time, next time ... :)
brian d foy <email@example.com>