Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Finding files recursively

by dsheroh (Monsignor)
on Aug 05, 2019 at 08:04 UTC ( #11103910=note: print w/replies, xml ) Need Help??

in reply to Finding files recursively

The problem is that finding over a large directory could take hours.
How large are we talking? Does it take hours to run ls -RU over that directory? If so, then there's nothing you can do in Perl to do it faster because that's how long it takes for the disk to retrieve the directory entries. A quick test on my laptop suggests that 1 hour may correspond to about a million directory entries on this machine, but your hardware may vary. Wildly.

Also, if you're on a *nix box, I'd be willing to bet that the OS's find binary is pretty well optimized. Generating a list of candidate directories with find $STARTING_DIR -name secret.file, then using Perl to run down that list and remove any with a .ignore file would probably be a pretty effective way to do this, albeit less effective as an exercise in using/learning more Perl, if that's your primary objective. There may even be a way to get find to filter out the directories with .ignore files in the first pass, so that you don't have to go back a second time to look for them, but my find-fu isn't up to that task.

Even if you're going to ultimately write a Perl solution regardless, generating a list of all the secret.files with find is going to be a good sanity check to estimate the absolute fastest possible time the task could be done in.

My first idea was to split the directories into the process so they will perform a parallel search but I'm not sure if that a good idea.
If your bottleneck is on disk I/O rather than on processing, then parallelization won't help (if it's already waiting on the disk, having more CPU cores waiting isn't going to make the disk any faster) and may make things significantly worse (by making the disk spend more time jumping from one directory to another, and less time actually reading the data you want).

Replies are listed 'Best First'.
Re^2: Finding files recursively
by ovedpo15 (Monk) on Aug 05, 2019 at 08:52 UTC
    Thanks for the reply!
    By parallel process, I meant to use fork(). Consider a directory with multiple subdirectories. I will use a fork and find all the valid directories for each one and then merge the arrays.
    Is it a bad idea?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11103910]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2020-10-31 05:14 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (286 votes). Check out past polls.