Care to share your script?
My ex-employers own the last ones I wrote, but they're pretty simple. Use File::Find to traverse a hierarchy, then ignore file types that aren't interesting (like .doc or .exe), and finally processing files that are. Stripped of extraneous detail, they look something like:
use File::Find;
my $ignore = '(?:\.exe|\.doc)$';
find(\&consider, $root);
emit_report();
sub consider {
my $path = $File::Find::path;
return if $path =~ m/$ignore/o;
process($path);
}
Processing consists of slurping the file into memory and
running a set of regexs against it, recording which files match which regex. Then sort the hashes and spit out HTML. It's just bookkeeping at this poing. Being a potential memory pig isn't a serious issue for a script that runs once a day at 2am.
|