Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Quickly Find Files with Wildcard?

by expresspotato (Beadle)
on Mar 23, 2009 at 00:38 UTC ( #752459=perlquestion: print w/ replies, xml ) Need Help??
expresspotato has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to find a file within a folder, but the file names aren't exact. The beginning contains a random number and the second contains a hash string separated by a period. I've produced the following code, but it takes far too long to execute for my planned use. If anyone has a faster method for doing this (including a linux command), I would love to hear how. Example file: 103746383.100740266a1b4bf5d456cb8876be0e4e9662d871
sub find_hash_core(){ opendir(DIR,"./"); @a = readdir(DIR); close(DIR); foreach (@a) { #print "$_"; if ($_ =~ "100740266a1b4bf5d456cb8876be0e4e9662d871"){ print "Found Hash!"; } } }

Comment on Quickly Find Files with Wildcard?
Download Code
Re: Quickly Find Files with Wildcard?
by ysth (Canon) on Mar 23, 2009 at 01:01 UTC
      Thank you for the prompt reply. The query takes about 1-2 seconds per server. Considering this is running on 10 severs for example, without threads it could take up to 20 seconds for a full query. I do use threads and have been able to get it down to about 5-8 seconds, but again this is quite slow. Also locate looks very fast, however I'm unsure how to prevent it from recursing.
        Where does the extra time between 1-2 seconds per server vs. 5-8 seconds for a threaded lookup on all servers come from? Are you doing communication between servers, or are the filesystems from all the servers available locally? If the later, is nfs or whatever actually slowing things down compared to a solution using ssh?

        To "prevent" locate from recursing, filter the results looking for your specific directory.

Re: Quickly Find Files with Wildcard?
by Albannach (Prior) on Mar 23, 2009 at 01:14 UTC
    I would try using substr instead of invoking a full regex since your match string is a constant. Secondly, I hope you drop out of your foreach loop once the file you want has been found (unless you expect more than one match that is), otherwise you will continue you expensive test for nothing. Hope this helps!

    --
    I'd like to be able to assign to an luser

Re: Quickly Find Files with Wildcard?
by linuxer (Deacon) on Mar 23, 2009 at 01:21 UTC

    Just some thoughts to this topic. Did no testing or benchmarking. So all of this could be nonsense ;o)

    How many entries are read? Maybe it is a speedup if you grep for plain files when reading from the directory (so you have less entries in @a which are checked afterwards). This maybe only makes sense, if there are much more directories than files.

    sub find_hash_core { my $dir = './'; opendir my $dirh, $dir or die "$dir: $!\n"; # if you need the current workdir, see Cwd how to retrieve that and +restore it later chdir $dir; my @files = grep { -f $_ } readdir $dirh; for ( @files ) { if ( m/1234yourhashvalue/ ) { print "found hash"; }

    You could even check that hash value inside the grep:

    my @files = grep { -f $_ && m/1234yourhashvalue/ } readdir $dirh;

    Did you try glob() instead of readir? Don't know if there is a big difference between those implementations...

    chdir $dir; my @files = glob( "*.1234yourhashvalue" );

    Did you try the linux file find command yet?

    my @files = qx{ find $dir -maxdepth 1 -type f -name "*.1234yourhashval +ue" };

    I think for a more detailed answer, please give more information... Otherwise it's up to you to find a solution...

    Update: fixed file/find typo

      Thank you for all your suggestions. The speed of responses here is extraordinary! It seems accessing these remote file systems over SSHFS was the main bottle neck. Also exiting on the first find of the hash seems to have helped reduce query times. Now the requests are still made using threads, but to each server directly by calling a specially formatted webpage that will return if that server has the hash key in question. My hats to you Monks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://752459]
Approved by linuxer
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-11-23 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (129 votes), past polls