|
Item Description: Enumerate files and directories in a directory tree
Review Synopsis: Use this module instead of globbing or readdir()
File::Find is the way if you want to look at all files in one or more directories.
File::Find exports one function, find(), which takes two parameters, a hash or a code reference, and a list of directories where the search starts.
Why use File::Find
File::Find protects you from a lot of nasty things that happen on filesystems. In its standard configuration it ensures that your code reference is called once for each file encountered, even if there are more symlinks pointing to it, and it also prevents nasty loops for symlinked directories.
Why avoid File::Find
There is not much reason to avoid File::Find - you
could want to avoid it if you want to read files in a single directory, without recursing, when you are explicitly sure that there can be no symlinks in that directory (for example, if the filesystem dosen't allow symlinks). Then, your code could load faster. But I'd file that under premature optimization.
Caveats
If you are starting to first use File::Find, you have to deal with some idiosyncrasies.
First of all, File::Find uses some "optimization" by default to speed up searches under certain filesystems under Unix. Unfortunately, this "optimization" fails to work under other filesystems, such as the iso9600 filesystem used for cdroms. ncw tells you below what to do about it - in fact, you should always use the code ncw proposes.
In the default configuration, the directory is changed to the recursed directory, and all returned filenames are relative to the current directory. Use $File::Find::name to get a fully specified filename.
If you don't want to recurse below a certain directory, there is the (not-so-well-documented) $File::Find::prune variable, which you can set to 1 in your code reference to stop recursing into the current directory.
Examples
By popular demand, here are some examples on how to use
the module. The documentation shows off some interesting code, but it's not helpful if you're looking for something to get started.
A first example, printing the filename and the filename with the path to the file. The code was stolen from a
node by nate.
use strict;
use File::Find;
sub eachFile {
my $filename = $_;
my $fullpath = $File::Find::name;
#remember that File::Find changes your CWD,
#so you can call open with just $_
if (-e $filename) {
print "$filename exists!"
. " The full name is $fullpath\n";
}
}
find (\&eachFile, "mydir/");
RE: File::Find by ncw (Friar) on Sep 16, 2000 at 15:01 UTC |
I've found another small problem with File::Find. If you are on a unix
based platform, and you use File::Find on a non-unix partition
eg dos, vfat (win9x), is9660 (cdrom) and AFS then File::Find doesn't work
properly. You need to add
$File::Find::dont_use_nlink=1
Into your program and it will work fine. As I understand it this lowers the
efficiency of File::Find because it has to look into each
directory to see if there are entries in it rather than just
looking at the nlink field in the inode. Non unix filesystems (and AFS ;-)
don't set this properly.
This note is based on my experience with File::Find under Linux.
It is probably similar under other *nix based systems but
since foreign partition mounting is an OS specific thing YMMV. | [reply] [d/l] |
|
| [reply] |
|
use strict;
use File::Find;
my $dir = shift;
my ($with, $without);
print "Counting files in $dir\n";
# This is the default on Linux
$File::Find::dont_use_nlink = 0;
find(sub { $without++ }, $dir);
$File::Find::dont_use_nlink = 1;
find(sub { $with++ }, $dir);
print "With \$File::Find::dont_use_nlink = 0: $without files found\n";
print "With \$File::Find::dont_use_nlink = 1: $with files found\n";
I ran this on a mounted iso9660 disc like this (note if the
disc has RockRidge extensions then it works properly!) :-
$ ./file_find_test.pl /mnt/cdrom
Counting files in /mnt/cdrom
With $File::Find::dont_use_nlink = 0: 29 files found
With $File::Find::dont_use_nlink = 1: 1300 files found
This was on Linux 2.2.17 with perl 5.00503 with the standard
File::Find that comes with the distribution
I agree with tye's comment here - $File::Find::dont_use_nlink should be 1
on all platforms - the slowdown isn't worth the incompatibilities. | [reply] [d/l] |
|
Actually the hack that dont_use_nlink disables only speeds up the most basic find operations where you don't care anything about the files to be found and then don't do anything with them. And it breaks File::Find on every platform I've ever used it on, just not on every file system on every platform.
The default should be made to not use this bad hack on any platform. The days of "most file systems of most Unix systems" supporting this hack have long since passed (it has been a long time since I've seen a Unix system without a CD-ROM drive, just to pick one example).
Sure, it was a cool hack a long time ago. And it can make just getting a listing of files much, much faster (depending on how your directories are structured). But a good module should err on the side of giving correct results over performance.
-
tye
(but my friends call me "Tye")
| [reply] |
Re: File::Find by larryl (Scribe) on Mar 14, 2001 at 01:13 UTC |
I find myself using File::Find more and more now that I've
got the hang of it. Typically you set up like so:
use File::Find;
find( \&do_stuff, $from_dir );
and do_work() is the place where all the real
work gets done.
A couple caveats that I've found (the hard way...) about
what you can do inside do_work():
-
Don't change $_ inside do_work().
If you want to, save a copy on entry and change it back
before returning.
-
As Corion mentions, the working directory is changed to
each recursed directory under your starting point. If you
change directories inside do_work(), save a
copy of the current directory on entry and
chdir back to it before returning.
-
The usual file test operator caveats apply, for example
-f $File::Find::name and
-l $File::Find::name are both true if the file
is a symlink to another file. If you're interested in
symbolic links, test for those first, before you test for
file- or directory-ness.
| [reply] [d/l] [select] |
Back to Reviews
|