Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

file globbing in a nested while loop

by George_Sherston (Vicar)
on Oct 05, 2001 at 16:29 UTC ( [id://116971] : perlquestion . print w/replies, xml ) Need Help??

George_Sherston has asked for the wisdom of the Perl Monks concerning the following question:

My brain, such as it is, hurts. What I want is something quite simple (and you'll probably tell me there's a module for it, which I will accept with humility): a file tree showing all the .pl files in a directory and its sub directories.

Now, I'd be very interested in suggestions how to do this differently, but what I'm really stuck on is the particular reason why what I've tried so far fails. Here's my first attempt:
use strict; use CGI qw/:standard/; my $ext = 'pl'; my $Indent = 15; my $q = new CGI; print $q->header; print $q->start_html; &GetDirs; print $q->end_html; sub GetDirs { my ($Dir, $Posn) = @_; chdir $Dir if $Dir; while (<*/>) { print "<div style=\"margin-left:$Posn\">$_</div>"; &GetDirs($_, $Posn + $Indent); } while (<*.$ext>) { print "<div style=\"margin-left:$Posn\"><a href=\"editor.pl?Ac +tion=GetScript&File=$_\" target=\"$_\">$_</a><BR>"; } chdir "../" if $Dir; }
What I expected was that it would go through all the directories, open each one, then loop back on itself and go through that directory, keep on doing that until it had run out of directories. Each time it ran out of directories it would exit the loop, print out the files themselves, and then drop down one level. Obviously it's a nested loop, and that way madness lies, but it's never going to be an infinite recursion, because my file tree is finite.

That was the plan. But actually what happens is, it loops round and round printing out the directories in my root directory over and over again forever. At first I thought the problem was that the working directory got reset each time I re-called &GetDirs; but I used Cwd (thanks to the nice people in the CB) to get an absolute directory reference which I passed, and I still had roughly the same problem:
sub GetDirs { my ($Dir, $Posn) = @_; my $OldDir = getcwd; chdir $Dir; while (<*/>) { print "<div style=\"margin-left:$Posn\">$_</div>"; &GetDirs(getcwd, $Posn + $Indent); } while (<*.$ext>) { print "<div style=\"margin-left:$Posn\"><a href=\"editor.pl?Ac +tion=GetScript&File=$_\" target=\"$_\">$_</a><BR>"; } chdir $OldDir; }
So now what I THINK is happening is that the while loop is not restarting when I call &GetDirs from within &GetDirs. This seems odd - but maybe it does this for a good reason. Or maybe it's something else again.

I would be very grateful for any guidance - please don't hesitate to point out what may seem obvious, as it certainly isn't obvious to me!

George Sherston

Replies are listed 'Best First'.
Re: file globbing in a nested while loop
by pjf (Curate) on Oct 05, 2001 at 16:44 UTC
    G'day George,

    There's a module that does exactly what you're looking for, it's called File::Find. It allows you to traverse a file tree, and (should you so desire) perform operations based upon each file you come across.

    Here's some code that demonstrates how to use File::Find to find and print the location of .pl files.

    use File::Find; find(\&printer, "/path/to/dir"); sub printer { print "I have found $File::Find::name\n" if /\.pl$/; }
    If you don't like to specify the sub-routine seperately, you can do it with an anonymous subroutine.
    use File::Find find( sub {print "I have found $File::Find::name\n" if /\.pl$/;}, "/path/to/dir");
    Please note that newer versions of File::Find contain some very rich features, much more than the manual page here on PerlMonks. Check your local install to see what extra fun you can find.

    Hope you find this useful.

    Cheers,
    Paul

      Thanks, and ditto to busunsl. That's certainly solved the problem - I can write my script now, and have re-learnt the ever valuable lesson that time spent researching CPAN is rarely time wasted. But I'd still be very interested if anyone can shed light on why my own attempt didn't work. Is the while loop not restarting? And if so why not?

      George Sherston
        If this runs on Unix, it might be you are hitting the '.' directory.

        You would be looping on that till eternity.

Re: file globbing in a nested while loop
by clintp (Curate) on Oct 05, 2001 at 17:15 UTC
    *handwaving at the File::Find answers* They're correct, but they don't answer your question.

    Your question is, why didn't this work?

    The answer is that while(glob("*/")) has the glob working in its scalar context mode, which returns one directory element at a time. The problem is that when you pop into a subdirectory and pop back up again the iterator for the glob starts all over again at the beginning of the directory. I just noticed, you do this twice in the program. You wind up stepping on both iterators each subdirectory you go down.

    Get the entire glob list out all at once into an array (which is localized to the sub) and then iterate over that. Try something like this (untested):

    sub GetDirs { my ($Dir, $Posn) = @_; chdir $Dir if $Dir; my @list=glob("*/"); foreach (@list) { print "<div style=\"margin-left:$Posn\">$_</div>"; &GetDirs($_, $Posn + $Indent); } my @exts=glob("*.$ext"); foreach (@exts) { print "<div style=\"margin-left:$Posn\"><a href=\"editor.pl?Ac +tion=GetScript&File=$_\" target=\"$_\">$_</a><BR>"; } chdir "../" if $Dir; }
    PS: I changed <> glob syntax to glob() syntax. Hope you don't mind. :)
Re: file globbing in a nested while loop
by busunsl (Vicar) on Oct 05, 2001 at 16:32 UTC
    File::Find should do the work of traversing the dir-tree.
Re: file globbing in a nested while loop
by trantor (Chaplain) on Oct 05, 2001 at 17:10 UTC

    I absolutely agree with the other posters and I think that File::Find is the solution.

    However if you want to fix your program keep a couple of things in mind: check if you're following symlinks and get the list at once evaluating <glob> in a list context instead of scalar context, for it might be confusing when mixed with recursion. You always use <*/> and if perl uses some sort of caching, in a scalar context, even in a deeper level of recursion, you'd get a file from the previous list of globbed files!

    And symbolic links can lead you to infinite recursion as well.

    To clarify with an example, I found out that this script is easily fooled by symlinks and even if there aren't any it goes into infinite recursion:

    #!/usr/bin/perl -w use strict; sub do_list { my($level, $base_path) = @_; my @list; # @list = <$base_path/*>; # for (@list) { while(<$base_path/*>) { print '>' x $level, ' '; print "$_\n"; next if -l $_; if(-d "$_") { do_list($level + 1, $_) } } $level--; } do_list(1, $ARGV[0] || '.');

    Now, just get rid of the while and use the assignment and the for loop instead: evaluating the glob in list context prevents interferences between recursion and the "next" value returned by the "right" glob expression.

    -- TMTOWTDI

Re: file globbing in a nested while loop
by Aristotle (Chancellor) on Oct 05, 2001 at 18:18 UTC
    Another note to take is that <*/> isn't a very clean way to find subdirectories; a better way is to use the -d filetest operator. I would propose something akin to this (untested):
    . . local $_; # always a good habit unless you know why don't want it chdir $Dir; opendir DIR, '.'; my (@dirs, @files); for (readdir DIR) { next if /^\.\.?$/; # we don't want to catch the . and .. entries push @dirs, $_ if -d and not -l; # only non-symlink dirs please push @files, $_ if -f and /\.ext$/; # -f because it could be a dir +ectory or other non-file called "something.$ext" } closedir DIR; . .
    Afterwards you have the directories and desired files in the appropriate arrays. A note for completeness' sake is that using readdir() like this can be noticably slower than a glob when directories contain a lot of files (as in several thousand). However unless you're running a heavy load application and this piece of code is among your script's bottlenecks, it won't make a difference while it is more maintainable and less prone to errors IMHO.

    But anyway - this was just an excercise, for real work, you should indeed rely on File::Find. :)