File:Find is that I think it's better for finding file "needles in a haystack", rather than processing ALL the files
I don't think so. Without fine tuning, find({wanted => \&wanted, ...}) invokes wanted for each file and directory found, so wanted sees the entire haystack including all needles, piece by piece.
I need to check or create (regardless) the output folder for and every file created
No. File::Find also calls wanted for the directories found during the file tree traversal. You need to check and create directories only in that case.
You could also use the preprocess option, it is invoked exactly when you want to create the target directory:
The value should be a code reference. This code reference is used to preprocess the current directory. The name of the currently processed directory is in $File::Find::dir. Your preprocessing function is called after readdir(), but before the loop that calls the wanted() function. It is called with a list of strings (actually file/directory names) and is expected to return a list of strings. The code can be used to sort the file/directory names alphabetically, numerically, or to filter out directory entries based on their name alone. When follow or follow_fast are in effect, preprocess is a no-op.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
Thanks, Alexander
I'll follow your advice and stick to File:Find, even though I don't get any benefit from finding/storing the large file list in advance of processing.
From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.
/Warren
| [reply] |
From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.
I've downloaded and read the source of the current File::Find, and except for an explicitly coded stack that I expected to be implicit, nothing unusual happens. I expect most memory usage inside File::Find to be the stack for descending the directory tree and the per-directory array of directory contents (i.e. readdir results). So unless you have a very deeply nested directory tree, where each directory is filled with millions of files, it is very unlikely that File::Find is the root of your memory problem.
Disable the image processing in your code (insert a return as first line of the wanted function if you have no better idea) and run it again. Watch the memory use. If it still consumes large amounts of memory, you likely have found a problem in File::Find. If not, look at your image processing code. Try to explicitly destroy the Image::Magick objects you created, i.e. $imageObject='';
Perhaps Image::Magick leaks some memory. I don't know, seach for yourself. If it leaks too much memory, you could move the actual image processing into a separate process that releases all leaked memory at its end. Something like this inside your wanted function should do the trick (untested code):
sub wanted
{
...
my $pid==fork() // die "Can't fork: $!";
if ($pid) {
# parent
waitpid($pid);
} else {
# child
processImage(...);
exit(0); # important!
}
...
}
Note that forking a sub-process has its own costs. Also note that fork and waitpid depend on the platform. They are not natively available on Window, Perl uses an emulation based on threads there. While the perl port makes your script think that it forked a new process, it just created a new thread, and leaked memory will not be freed until your script ends. So this trick will most likely not work on Windows.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |