Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Searching for files efficiently help!

by jethro (Monsignor)
on Nov 16, 2011 at 16:06 UTC ( #938407=note: print w/replies, xml ) Need Help??

in reply to Searching for files efficiently help!

Put your files into a hash. then you can throw away that last foreach loop and instead simply have this

if (exists $filelist{$f_name}) { print "$c - Found and to be deleted: $files\n"; }

Hash key in %filelist would be the filename, hash value is unimportant, you may use 1 or even further information about the file in there

Replies are listed 'Best First'.
Re^2: Searching for files efficiently help!
by Anonymous Monk on Nov 16, 2011 at 18:24 UTC
    Do you mean something like this?
    #!/usr/bin/perl -w use strict; use File::Find::Rule; my $startdir = "/allfiles"; my @filelist = qw(1234567_bc_20101000.txt 99877_xy_20111111.txt); #for +testing my %filelist = map {$_, 1} @filelist; my $includeFiles = File::Find::Rule->file ->name('*.txt'); # search by file extensi +ons my @files = File::Find::Rule->or( $includeFiles ) ->in($startdir); #locate only txt files in the starting directory my $includeFiles = File::Find::Rule->file ->name('*.txt'); # search by file extensi +ons my @files = File::Find::Rule->or( $includeFiles ) ->in($startdir); my $f_name; my $c=0; foreach my $files(@files) { $c++; if($files=~/(.*?)\/([^\/]+)$/) { $f_name = $2; } if (exists $filelist{$f_name}) { print "$c - Found and to be deleted: $files\n"; } }

    But when I print the results I am getting all files and not only the ones that was found!

      Can't confirm your observation. I tested this script and it worked as expected, after throwing away the duplicate lines "my $includeFiles = ..." and "my @files = ..".

Re^2: Searching for files efficiently help!
by Anonymous Monk on Nov 16, 2011 at 20:23 UTC
    The directory where the search will start has about 10GB of files in it, do you think that this code will be efficient enough to handle such a directory size?

      There would seem to be two ways you can do this (but see correction below):

      1. Go through your array, deleting each file name if it exists as a file, or
      2. Go through your directory structure, checking each filename/path against your array (after turning it into a hash), and deleting it if it exists in the hash.

      Generally, I would prefer the first method. It's almost certain to be faster to go through a list of files and check for their existence than to traverse an entire directory structure and check every file against a list. If you simply go through your array, checking for the existence of each pathname and deleting if it's found, then it doesn't matter how large or complex your directory structure is.

      for my $file (@files){ if( -f $file ){ report($file); # however you want to report a match if( unlink $file ){ print "Deleted $file\n"; } else { warn "Unable to delete $file\n"; } } }

      Correction: As Jethro pointed out, I misunderstood the original requirements, getting the two arrays he mentioned mixed up. The array he wants to check the files against does not have full path names, so my solution won't work. He will have to recurse through the directory structure and check them one by one.

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.

        Your script would only work if the filenames he is looking for came with complete path. Seems not to be the case

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938407]
[LanX]: evil python
[chacham]: survival of the fittest?

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2017-03-29 09:54 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (344 votes). Check out past polls.