Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Wondering about File::Find

by jynx (Priest)
on Dec 29, 2000 at 01:31 UTC ( [id://48695]=perlquestion: print w/replies, xml ) Need Help??

jynx has asked for the wisdom of the Perl Monks concerning the following question:

Recently i posted a response to someone else's question in SOPW but since i don't know if it'll ever get hit by other monks who might answer my questions (since it turned out to be more questions than answers) i wanted to post something that would be seen. Sorry for the duplication, but i do want these questions answered if possible...

In essence, i looked at someone else's answer and rewrote as a one-liner in File::Find:

#!/usr/local/bin/perl -w use strict; use File::Find; find { wanted => sub { print if /\.pm$/ }, no_chdir=>1} $dir foreach m +y $dir @INC;
However, when looking at @INC, it seems there are subdirectories in there that would be handled by the time we got there, this could produce some duplication and because of disk access, slow things down. So we make a hash of where we've been:
# use strict and the ilk above my %been_at; sub wanted { return if $been_at{$dir}; $been_at{$dir}=1; print if /\.pm$/; } find {wanted => \&wanted, no_chdir => 1} $dir foreach my $dir @INC;
Then the following questions came to mind:
1) is it faster to create the hash than search the directories again (if we ignore duplicate entries)?
2) is it faster to use or not use no_chdir?

@INC is a small search space, but if we generalize this a little to larger unknown search spaces the questions seem a little more potent (to me at least). Since File::Find has to scan the disk it is one of the slower parts of programs and knowing how to use it better would certainly be a Good Thing(tm).

i'm still pretty new to File::Find and these are things i was wondering, please help,

jynx

Replies are listed 'Best First'.
Re: Wondering about File::Find
by fundflow (Chaplain) on Dec 29, 2000 at 01:48 UTC
    Quick answers

    1. A disk access is about 1000 times slower than memory access and thus the hash is much faster, unless the hash is too big to fit the main memory.

    2. On unix, and most likely on Windows too, disk access gets cached (due to the above reason) and thus the chdir won't make much difference since find() needs to access the directory once and thus it is in the cache.



    Using your hash will most likely speed things up, Note that the hash can contain only the roots in @INC and thus is very short. For this, just sort @INC and thus you will have the prefixes before the rest.

      The advantage of chdir(), at least on some systems, is that the whole path doesn't need to be traversed for each directory and file access. If you don't chdir(), then you are doing things like stat("sub/dir/tiny/file") which, even if the cache works very effectively, has to find "sub", then find "dir", then find "tiny", then find "file".

      If you chdir(), then your process keeps a handle into that directory so that stat("file") doesn't have to even look in the cache for "sub", "dir", and "tiny". I'd be interested to see benchmarks on what practical effect this can have on the whole process.

              - tye (but my friends call me "Tye")
Re: Wondering about File::Find
by BatGnat (Scribe) on Dec 29, 2000 at 03:08 UTC
    I have wriiten a program that uses a recursing directory abilty, when I first started using perl. I found after a lot of comparing, the better option for my program.
    Using File::Find was slower than using opendir/closedir and some fancy array stuff.
    Try Here for my code
    BatGnat

    This project cannot be completed successfully as we require - a shrubbery!
Re: Wondering about File::Find
by chipmunk (Parson) on Jan 04, 2001 at 08:12 UTC
    (Answer reposted here.)

    Checking and setting a value in a hash has to be faster than hitting the filesystem. Here is a script that does this:

    use strict; use File::Find; my %seen; sub wanted { if (-d $File::Find::name) { if ($seen{$File::Find::name}) { $File::Find::prune = 1; return; } $seen{$File::Find::name} = 1; } print "$File::Find::name\n" if /\.pm$/; } find \&wanted, @INC;
    Note that the wanted function should check %seen only for actual directories. Setting $File::Find::prune is what actually stops File::Find from recursing further into the directory.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://48695]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2025-07-17 12:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.