Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Overview

Often times you find yourself in a situation where you need to delve deep into directories, their subdirectories, the subdirectories' subdirectories, and so on and so forth.

This is a great example of where recursion comes into play. It allows you to write a single routine which can call itself again and again, as many times as needed.

There is a very notable module that helps with this task, and that is File::Find. This is an incredibly useful module and should always be used for any production code. It takes care of many of the nuances in file processing, such as the handling of symbolic links, hard-link counts, and so on.

This tutorial, on the other hand, is designed to give you a basic understanding of recursion in Perl, and should hopefully be beneficial to you for more than just file and directory processing (though that will be the focus). After reading this, I hope that you will be able to look at a recursive file processing routine (be it with File::Find or otherwise), and have a very clear understanding of what it does, and how it does it.

Some Conventions

For the sake of clarity, here are a few conventions used in this tutorial:
  • 'path' - Refers to the filesystem path of a file. For example, the path to '/home/count0/filename' would be '/home/count0'.
  • 'file' - Any type of file, including directories.

The (pseudo-code) Algorithm

For recursive processing, where a returned list of files is not needed:
(One example may be to rename all or certain files)
process_files() with the base path as 'path' process_files(): get a list of all files in 'path' for each of the files if it is not a directory and it needs processing process it if it is a directory process_files() with this dir as 'path'

If a returned list of files is needed:
(Note that this can be made to do processing as well)
list_of_all_files = process_files() with the base path as 'path' process_files(): get a list of all files in 'path' for each of the files if it is not a directory process it if necessary add it to our list of files if it is a directory process_files() with this dir as 'path' add the files returned from process_files() to our lis +t of files return our list of files

The Code

First, we'll make a very basic example. In it, we will not be returning any lists of files, but simply doing processing on each.

process_files ($base_path); # Accepts one argument: the full path to a directory. # Returns: nothing. sub process_files { my $path = shift; # Open the directory. opendir (DIR, $path) or die "Unable to open $path: $!"; # Read in the files. # You will not generally want to process the '.' and '..' files, # so we will use grep() to take them out. # See any basic Unix filesystem tutorial for an explanation of the +m. my @files = grep { !/^\.{1,2}$/ } readdir (DIR); # Close the directory. closedir (DIR); # At this point you will have a list of filenames # without full paths ('filename' rather than # '/home/count0/filename', for example) # You will probably have a much easier time if you make # sure all of these files include the full path, # so here we will use map() to tack it on. # (note that this could also be chained with the grep # mentioned above, during the readdir() ). @files = map { $path . '/' . $_ } @files; for (@files) { # If the file is a directory if (-d $_) { # Here is where we recurse. # This makes a new call to process_files() # using a new directory we just found. process_files ($_); # If it isn't a directory, lets just do some # processing on it. } else { # Do whatever you want here =) # A common example might be to rename the file. } } }

That was a bare-bones template for how you will process files recursively.
But what if you want to return a list of all files in a directory and all of its subdirectories?

Building on the previous one, this example will go through a directory and each of its subdirectories and compile a list of all the files in them.
process_files ($base_path); # Accepts one argument: the full path to a directory. # Returns: A list of files that reside in that path. sub process_files { my $path = shift; opendir (DIR, $path) or die "Unable to open $path: $!"; # We are just chaining the grep and map from # the previous example. # You'll see this often, so pay attention ;) # This is the same as: # LIST = map(EXP, grep(EXP, readdir())) my @files = # Third: Prepend the full path map { $path . '/' . $_ } # Second: take out '.' and '..' grep { !/^\.{1,2}$/ } # First: get all files readdir (DIR); closedir (DIR); for (@files) { if (-d $_) { # Add all of the new files from this directory # (and its subdirectories, and so on... if any) push @files, process_files ($_); } else { # Do whatever you want here =) .. if anything. } } # NOTE: we're returning the list of files return @files; }

Real Example

Just for the sake of completeness, and to help you get started writing recursive routines to suit your needs, here is an example of a recursive function that actually does something. As you'll see, I have used a few common shortcuts and idioms which make it look different than the above examples.. but hopefully you will now be able to read this with confidence.
# Accepts one argument: the full path to a directory. # Returns: A list of files that end in '.html' and have been # modified in less than one day. sub get_new_htmls { my $path = shift; my $ONE_DAY = 86400; # seconds opendir (DIR, $path) or die "Unable to open $path: $!"; my @files = map { $path . '/' . $_ } grep { !/^\.{1,2}$/ } readdir (DIR); # Rather than using a for() loop, we can just # return a directly filtered list. return grep { (/\.html$/) && (time - (stat $_)[9] < $ONE_DAY) && (! -l $_) } map { -d $_ ? get_new_htmls ($_) : $_ } @files; }

UPDATE Per merlyn's comments, made it even more clear that this is not intended to be used as production code.
Added symlink ignoring to real example.

In reply to Directory Recursion by count0

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (3)
    As of 2014-07-12 22:10 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      When choosing user names for websites, I prefer to use:








      Results (241 votes), past polls