Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Behavior of File::Find's preprocess and glob

by hsinclai (Deacon)
on Feb 03, 2005 at 22:42 UTC ( #427850=perlquestion: print w/replies, xml ) Need Help??

hsinclai has asked for the wisdom of the Perl Monks concerning the following question:

Here's a representative example from a larger script - I used the preprocess key in File::Find in an attempt to minimize matching operations and overhead inside of wanted's subroutine in the case of having to sift through many many files.
#!/usr/bin/perl -w use strict; use Cwd; use File::Find; my $filespec = '*.pl *.txt'; my $dir = $ARGV[0] || getcwd(); find( { wanted => \&find_function, preprocess => \&globber }, $dir ); sub find_function { print $File::Find::name . $/; } sub globber { ( glob "$filespec" ); }
The correct files in the first level directory get returned, but two additional things happen:
  • the first directory name (File::Find::dir) gets returned (I'm trying to get rid of that)
  • File::Find seems to stop doing recursion


I must misunderstand what "preprocess" is for, but I am handing it the list it expects, I think..

So, does anyone know why the directory name gets returned in this case, and even better yet, a slick way to be able to specify file extensions from outside of "wanted" ?

Many thanks.. PS- I know it might be easier with File::Finder, and maybe other modules.. I was trying to stick with a standard module... many thanks..

Update: minor clarification of explanation

Replies are listed 'Best First'.
Re: Behavior of File::Find's preprocess and glob
by edoc (Chaplain) on Feb 04, 2005 at 01:07 UTC

    It appears that find_function($dirname) is actually called before (or after, depending on your options) the "wanted" function is called so you need to filter directories out in find_function.

    If you do filter our directory names in the "preprocess" function then you'll prevent File::Find from recursing into those directories.

    Update: d'oh! mixed up the function names..

    #!/usr/bin/perl -w use strict; use Cwd; use File::Find; my $filespec = qr/\.(?:txt|pl)$/; my $dir = $ARGV[0] || getcwd(); find( { wanted => \&find_function, preprocess => \&globber }, $dir ); sub find_function { return if (-d $File::Find::name); print $File::Find::name.$/; } sub globber{ my @files; foreach(@_){ push(@files, $_) if (/$filespec/ || (-d $_)); } return @files; }

    cheers,

    J

      If you do filter our directory names in the "wanted" function then you'll prevent File::Find from recursing into those directories.
      Did you mean filter out the directory names in "preprocess" (as opposed to the "wanted" function) - That is why the recursion stopped! I see..

      .. it still doesn't answer the question as to why the "preprocess" glob of *.pl *.txt in my earlier snippet returns the directory name.. but thanks, I do see now how I was blocking File::Find from recursing by removing directories from the what "preprocess"'s return list ..

      Although your code rework works perfectly, the same result can be gotten with just doing directory and file-extension pattern matching in "wanted".. without bothering with a "preprocess" call ... so for example what if there were several hundred subdirectories which I knew did not contain *.pl or *.txt - I was trying to find a way that "wanted" could skip the needless processing/recursion..

        .. it still doesn't answer the question as to why the "preprocess" glob of *.pl *.txt in my earlier snippet returns the directory name

        Actually it doesn't. Your "wanted" function prints it when it is called with the directory name before "preprocess" is called with the directory contents. The first thing I did with your code was to add 'print "GLOBBING\n";' to the preprocess function. The directory name is printed before GLOBBING.

        Sounds like you want to change the $filespec regex to only exclude directories you don't wish to traverse. Then filter out the rest of the directories in the "wanted" function. You could create a list of directories as keys to a hash and have "preprocess" skip any directories in the list.

        Update: Added code

        #!/usr/bin/perl -w use strict; use Cwd; use File::Find; my $filespec = qr/\.(?:txt|pl)$/; my %dirskip = ( 'path/to/dir' => 1, 'path/to/another/dir' => 1 ); my $dir = $ARGV[0] || getcwd(); find( { wanted => \&find_function, preprocess => \&globber }, $dir ); sub find_function { return if $File::Find::name !~ /$filespec/ || (-d $File::Find::name) +; print $File::Find::name.$/; } sub globber{ my @files; foreach(@_){ push(@files, $_) unless $dirskip{$File::Find::name}; } return @files; }

        cheers,

        J

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://427850]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2022-05-19 02:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (71 votes). Check out past polls.

    Notices?