|Perl: the Markov chain saw|
Traversing the directory treeIt is often needed to traverse all files in some directory tree recursively - similarly to what the Unix "find" command does, in Perl. It is possible to do so the "hard way", using opendir, readdir and their friends. But in Perl, naturally, TMTOWTDI. Not only I want to present an "other way to do it", but IMHO a "better way to do it", especially for beginners who only need to perform simple tasks.
File::Find basicsJust remember - if you have to traverse files recursively and do some processing on them, this is your friend:
This module makes recursive file traversal as easy as you could imagine. The following is a naked template for working with this module:
First, a starting directory is initialized in $dir. If you imagine the directory structure as a tree, this is the root, from which the search starts.
Then, find (a function from the File::Find module) is called. It is given a reference to a subroutine and the starting directory. find will traverse the directory tree and call the supplied subroutive on each file (be it just a file, a directory, a link, etc).
Then we see the definition of the processing function. It gets one argument (stored in $_), the file currently seen by find. Consider the following simple example (it prints the names of all directories, starting with "." - the current directory):
Here, the subroutine print_name_if_dir is given as an argument to find. It simply prints the name of the file if it's a directory. Note the peculiar notation... It's customary in Perl not to mention $_, so:
Is equivalent to:
Both are quite cryptic (but hey, it's Perl), and for clarity the routine could be rewritten as:
Routines in Perl can be anonymous, which is more suitable for such simple tasks, so the whole program may be rewritten as:
Just 3 lines of code, and we're already doing something useful !
For the more advancedThe internal variable $File::Find:name can be used at any time to report the full path to the file. Consider the following improved version of our little script:
Try running it and compare the results to the previous version. You will notice that it prints the full path to the directory. What happens is the following - Find::File chdirs into each directory it finds in its search, and $_ gets only the short name (w/o path) of the file, while Find::File::name gets the full path. If, for some reason, you don't want it to chdir, you may specify no_chdir as a parameter. Parameters to find are passed as a hash reference:
Note that "wanted" is the key for the file processing routine in this hash.
The results won't differ from the previous version. Here, however, $_ will also be the full path to a file, because find doesn't "dive into" the directories.
Other parameters may be specified (like 'bydepth' if want a depth-first-search), but these are advanced topics. If you're curious, you can look these issues up in the documentation of the module.
Bonus - a useful utility based on File::FindEver felt that your quota suffocates you, and couldn't find the unnecessary large files to remove ? Do you find "du" too tedious to use in these cases ? File::Find comes to the rescue. Consider the following script... It takes a starting directory, and prints the 20 largest files found in the tree under this directory - specifying full paths, so you can just cut-n-paste them into "rm":
What goes on here ? find traverses the given directory recursively, taking notice of each file's size in the $size hash table (-s if -f means = get the size if this is a file). Then, it sorts the hash table by size, and prints the 20 largest files. That's it... I use this utility quite a lot to clean space, I hope you find it useful too (and also understand exactly how it works !)
Update:Thanks to rinceWind for this:
File::Find is cross-platform. It's one of the really handy ways for iterating directory trees on Windows - something Microsoft don't encourage you to do, with their 'hidden files' (File::Find X-rays through Windows hidden files mechanism nicely :-).
With this in mind, though, you must be careful when working with Windows' paths, because slashes there have a different direction. There is a nice tutorial - Paths in Perl, that explains this.
Update 2:There are some nice continuation replies written to this tutorial - special thanks to Aristotle, who supplied some info for the real advanced use of File::Find.
ConclusionFile::Find can turn the tasks dealing with recursive file traversal from torture to pleasure, if you know how to use it. Modules like this make Perl a wonderful language it is - you can perform useful tasks without pain. Enjoy !
Edit by tye to add READMORE