http://www.perlmonks.org?node_id=1005304


in reply to Re: Recursive image processing (with ImageMagic)
in thread Recursive image processing (with ImageMagic)

My only problem is using something like File:Find is that I think it's better for finding file "needles in a haystack", rather than processing ALL the files (they're all PNGs in my case). It also makes the directory handling more complicated as I need to check or create (regardless) the output folder for and every file created. Unless, I suppose, File:Find returns files in blocks so I can check the last folder created and only mkdir when needed.

Do you also feel that since every file/sub-directory is being processed, then a recusrive search would be best. i.e. every time a new directory is handle, I mkdir the same in the destination.

  • Comment on Re^2: Recursive image processing (with ImageMagic)

Replies are listed 'Best First'.
Re^3: Recursive image processing (with ImageMagic)
by afoken (Chancellor) on Nov 24, 2012 at 02:53 UTC
    File:Find is that I think it's better for finding file "needles in a haystack", rather than processing ALL the files

    I don't think so. Without fine tuning, find({wanted => \&wanted, ...}) invokes wanted for each file and directory found, so wanted sees the entire haystack including all needles, piece by piece.

    I need to check or create (regardless) the output folder for and every file created

    No. File::Find also calls wanted for the directories found during the file tree traversal. You need to check and create directories only in that case.

    You could also use the preprocess option, it is invoked exactly when you want to create the target directory:

    The value should be a code reference. This code reference is used to preprocess the current directory. The name of the currently processed directory is in $File::Find::dir. Your preprocessing function is called after readdir(), but before the loop that calls the wanted() function. It is called with a list of strings (actually file/directory names) and is expected to return a list of strings. The code can be used to sort the file/directory names alphabetically, numerically, or to filter out directory entries based on their name alone. When follow or follow_fast are in effect, preprocess is a no-op.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks, Alexander

      I'll follow your advice and stick to File:Find, even though I don't get any benefit from finding/storing the large file list in advance of processing.

      From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.

      /Warren

        From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.

        I've downloaded and read the source of the current File::Find, and except for an explicitly coded stack that I expected to be implicit, nothing unusual happens. I expect most memory usage inside File::Find to be the stack for descending the directory tree and the per-directory array of directory contents (i.e. readdir results). So unless you have a very deeply nested directory tree, where each directory is filled with millions of files, it is very unlikely that File::Find is the root of your memory problem.

        Disable the image processing in your code (insert a return as first line of the wanted function if you have no better idea) and run it again. Watch the memory use. If it still consumes large amounts of memory, you likely have found a problem in File::Find. If not, look at your image processing code. Try to explicitly destroy the Image::Magick objects you created, i.e. $imageObject='';

        Perhaps Image::Magick leaks some memory. I don't know, seach for yourself. If it leaks too much memory, you could move the actual image processing into a separate process that releases all leaked memory at its end. Something like this inside your wanted function should do the trick (untested code):

        sub wanted { ... my $pid==fork() // die "Can't fork: $!"; if ($pid) { # parent waitpid($pid); } else { # child processImage(...); exit(0); # important! } ... }

        Note that forking a sub-process has its own costs. Also note that fork and waitpid depend on the platform. They are not natively available on Window, Perl uses an emulation based on threads there. While the perl port makes your script think that it forked a new process, it just created a new thread, and leaked memory will not be freed until your script ends. So this trick will most likely not work on Windows.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)