Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^3: Recursive image processing (with ImageMagic)

by afoken (Chancellor)
on Nov 24, 2012 at 02:53 UTC ( [id://1005328]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Recursive image processing (with ImageMagic)
in thread Recursive image processing (with ImageMagic)

File:Find is that I think it's better for finding file "needles in a haystack", rather than processing ALL the files

I don't think so. Without fine tuning, find({wanted => \&wanted, ...}) invokes wanted for each file and directory found, so wanted sees the entire haystack including all needles, piece by piece.

I need to check or create (regardless) the output folder for and every file created

No. File::Find also calls wanted for the directories found during the file tree traversal. You need to check and create directories only in that case.

You could also use the preprocess option, it is invoked exactly when you want to create the target directory:

The value should be a code reference. This code reference is used to preprocess the current directory. The name of the currently processed directory is in $File::Find::dir. Your preprocessing function is called after readdir(), but before the loop that calls the wanted() function. It is called with a list of strings (actually file/directory names) and is expected to return a list of strings. The code can be used to sort the file/directory names alphabetically, numerically, or to filter out directory entries based on their name alone. When follow or follow_fast are in effect, preprocess is a no-op.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^4: Recursive image processing (with ImageMagic)
by wvick (Novice) on Nov 24, 2012 at 09:06 UTC

    Thanks, Alexander

    I'll follow your advice and stick to File:Find, even though I don't get any benefit from finding/storing the large file list in advance of processing.

    From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.

    /Warren

      From an overnight test run, my process consume a lot of memory. Could this ben Find:Find ? I suspect it's more to do with the image processing and creation of image objects which are not being released.

      I've downloaded and read the source of the current File::Find, and except for an explicitly coded stack that I expected to be implicit, nothing unusual happens. I expect most memory usage inside File::Find to be the stack for descending the directory tree and the per-directory array of directory contents (i.e. readdir results). So unless you have a very deeply nested directory tree, where each directory is filled with millions of files, it is very unlikely that File::Find is the root of your memory problem.

      Disable the image processing in your code (insert a return as first line of the wanted function if you have no better idea) and run it again. Watch the memory use. If it still consumes large amounts of memory, you likely have found a problem in File::Find. If not, look at your image processing code. Try to explicitly destroy the Image::Magick objects you created, i.e. $imageObject='';

      Perhaps Image::Magick leaks some memory. I don't know, seach for yourself. If it leaks too much memory, you could move the actual image processing into a separate process that releases all leaked memory at its end. Something like this inside your wanted function should do the trick (untested code):

      sub wanted { ... my $pid==fork() // die "Can't fork: $!"; if ($pid) { # parent waitpid($pid); } else { # child processImage(...); exit(0); # important! } ... }

      Note that forking a sub-process has its own costs. Also note that fork and waitpid depend on the platform. They are not natively available on Window, Perl uses an emulation based on threads there. While the perl port makes your script think that it forked a new process, it just created a new thread, and leaked memory will not be freed until your script ends. So this trick will most likely not work on Windows.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1005328]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found