How do I search a directory tree for files?

shandor has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: How do I search a directory tree for files?
by Anonymous Monk on Jun 28, 2000 at 12:28 UTC

There are two possibilities, depending on how static the directory contents are, and how willing you are to trade speed against memory.

The first solution searches the whole tree. This is the solution to go for if the directories themselves are static, but the contents of the directories are not static. This version is slow, but it dosen't consume much memory (on harddisk).

#!/usr/bin/perl -w

use strict;
use File::Find;

my @directories = (".", "/home/mp3");
my @foundfiles;

# Here, we collect all .mp3 files below each directory in @directories
+ and put them
# into @foundfiles
find( sub { push @foundfiles, $File::Find::name if /\.mp3$/ }, @direct
+ories );

# and output them all
print join("\n",@foundfiles), "\n";
[download]

The second version uses a two step approach. We compute a list of all (interesting) files in the directory tree once, and save it into a file. If we want to check if a certain file is in the directory tree, we load this file into a hash and have a really fast lookup (if we want to look up more than one file) or we go through the file line by line (if we only look for a single file). This method obviously only works if the directory contents don't change very often, because our file is not always up-to-date. The code above serves very well to create the list of interesting files, just redirect its output into a file called index.

#!/usr/bin/perl -w

use strict;
use File::Basename;

my %files;
my @searchfiles = ("foo", "bar", "xxx");

open( INDEX, "< index" ) or die "Couldn't read index : $!\n";

# now we read every filename from our index file and put it in the has
+h.
# If we are only checking for one file, we could do the check right he
+re
# in the loop.
# We also strip the path from the filename, as we will be searching fo
+r files
# (and if we already knew the path to the file, -e would be faster :) 
+)
# This method dosen't care for duplicates. If we have two files with t
+he same name,
# only the last file will be reported.
my $filename;
while( <INDEX> ) {
  $files{ basename( $_ ) } = $_;
};
close INDEX;

# And now we check if the filenames are in the hash
foreach (@searchfiles) {
  print $files{ $_ } if (exists $files{ $_ });
};
[download]

[reply]
[d/l]
[select]


Perl: the Markov chain saw
	PerlMonks