Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

The situational efficiency of File::Find

by OzzyOsbourne (Chaplain)
on May 09, 2001 at 16:46 UTC ( [id://79061]=perlquestion: print w/replies, xml ) Need Help??

OzzyOsbourne has asked for the wisdom of the Perl Monks concerning the following question:

Let us assume that I have the following directory structure:
\user \user1 foo.bar \user2 \user4 \user500 foo.bar
Given:
  1. Each of the user directories has a TON of sub directories.
  2. File::Find seems to start at the lowest subdirectory and work it's way up to the root.
  3. I tried bydepth and it does not seem to limit directory depth like I thought that it would
  4. I am looking for foo.bar, and I know that it will be in the 1st subdirectory under users directory
  5. I will not know the names of the users directories (user1, user2) before the script is run
  6. This is not homework, as I have not attended school in a very long time

File::Find would help me find those subdirectories, but it will also check a ton of files and subdirectories (up to 2 gig per user share at times) that I don't need to look at.

I can write it the file::find way, but I question it's efficiency in this situation.

-OzzyOsbourne

Replies are listed 'Best First'.
Re: The situational efficiency of File::Find
by Corion (Patriarch) on May 09, 2001 at 17:11 UTC

    I suggested a $File::Find::prune approach in the chatterbox, but now I wonder whether readdir() together with -f wouldn't be better (untested):

    opendir DIR, "/"; my @files = readdir DIR; closedir DIR; @files = grep { -d "/$_" } @files; my $dirname; foreach $dirname @DIR { if (-f "/$dirname/foo.bar") { print "$dirname has foo.bar\n" }; };
Re: The situational efficiency of File::Find
by knobunc (Pilgrim) on May 09, 2001 at 17:27 UTC

    bydepth just means to process the contents of a directory before processing the directory itself. So for your example find . -depth -print (equivalent to setting bydepth) would print the following:

    /user/user1/foo.bar /user/user1 /user/user2 /user/user4 /user/user500/foo.bar /user/user500 /user
    Rather then the results from find . -print:
    /user /user/user1 /user/user1/foo.bar /user/user2 /user/user4 /user/user500 /user/user500/foo.bar

    What you might be thinking of is a breadth-first search that looks at all nodes the same depth down before moving on. You can't do this with File::Find or find since it would need to keep a lot of state kicking around to remember where to go next. It also would not do what you need since it would still traverse all files.

    What (I think) you want to tell it is to stop looking if it is more than 2 directories in. The following wanted function looks at the current directory name and tells the find to stop looking deeper if there is a / in it.

    sub wanted { my $depth = $File::Find::dir =~ tr{/}{/}; $File::Find::prune = 1 if $depth == 1; print("$depth $File::Find::name\n"); }

    BTW you said that it would check up to 2 gig of files, remember that it does not actually need to look at the contents of the files, just their directory information. So it is still expensive, but unless you have lots of small files, it should be much smaller than 2 gig.

    -ben

(ar0n: I glob you) Re: The situational efficiency of File::Find
by ar0n (Priest) on May 09, 2001 at 18:35 UTC
    Since you're not recursing, I'd go for this:
    my @files = grep { -f and /pattern/ } glob("/home/*/*/*");
    I'm not too happy with three consecutive asterisks (why I don't know), but it seems to be working.

    You may want to benchmark this with the other ideas, though.


    ar0n ]

Re: The situational efficiency of File::Find
by SilverB1rd (Scribe) on May 09, 2001 at 18:10 UTC
    If you have lots of files you can brake it down into 2 readdirs.

    Untested script
    @UserDirectorys = (); opendir (DIR, "/user/"); @UserDirectorys = grep {not /^(\.\.?)$/} // probably not the best wa +y to do this readdir(DIR); closedir (DIR); foreach $USER ( @UserDirectorys ) { @UserFiles = (); opendir (DIR, "/user/$USER"); @UserFiles = readdir(DIR); closedir (DIR); # Search @UserFiles for files needed.... }
    This way you would not have to read in every file from every folder at one time.

    ------
    The Price of Freedom is Eternal Vigilance
Re: The situational efficiency of File::Find
by OzzyOsbourne (Chaplain) on May 09, 2001 at 19:45 UTC

    Update: File::Find even with the prune option seems to take way more time than this piece of code. I say "seems", b/c I canned the file::find version once it ran over the time of this code's benchmark.

    Thanks (++) for all the help!

    # Finds all of the thingtolookfor.txt across all servers and prints ou +t a list # of those people that do not have them to log.txt use strict; use Benchmark; my $t0 = new Benchmark; my ($server,$usershare); my $out='//mymachine/myshare/log.txt'; my @servers=('XXY','ZZZ','ETC'); open OUT,">$out"; foreach $server (@servers){ my $dir1="//$server/c\$/chompy"; # Check if chompy share is on $C or D$ if (!(-e "$dir1")){#if directory doesn't exist try d$ $dir1="//$server/d\$/chompy"; if (!(-e "$dir1")){ die "Directory not does not exist on $server\n...Exiting S +cript.\n"; } } # Read the user shares opendir(DIR, $dir1) or die "can't opendir $dir1: $!"; #weed out dots and get only dirs my @dirs = grep { !/^\./ && -d "$dir1/$_" } readdir(DIR); closedir DIR; foreach $usershare(@dirs){ my $userdir="$dir1/$usershare"; if (!-e "$userdir/thingtolookfor.txt"){ print OUT "$userdir:\tno\n"; print "$userdir:\tno\n"; } } } close OUT; #benchmarking info my $t1 = new Benchmark; my $td = timediff($t1, $t0); print OUT "The code took:",timestr($td),"\n";

    -OzzyOsbourne

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://79061]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-03-29 01:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found