http://www.perlmonks.org?node_id=1081495


in reply to Monitoring directory contents

You should define if you care to not skip any files in the case new files arrive while your program fails to run. If you just monitor the directory for changes and your program goes down for some reason, any files added will be ignored.

You should also think about what happens if a file for some reason gets to be processed twice. Would it be a bigger problem than some wasted resources? If yes, maybe you should think about having the parsing program keeping track of processed files and just use rsync to fetch them in your working directory. Or write a script to run after rsyncing the files in your working directory.

Here is a version of Discipulus' code below, not copying files but just keeping track and calling the xml processing script, expected to be called from a cron job:

## pseudo code: my %cache_of_already_read_files; my @xml; %cache_of_already_read_files = &load_cache_from_somewhere; if (not defined %cache_of_already_read_files) { # Load failed. # Do some assumptions here to have a starting point, for example: @xml = &get_xml_files_names_based_on_timestamp; # ... or just assume that this is the first run: #@xml = &get_xml_files_names; } else { @xml = &get_xml_files_names; }; foreach my $filename (@xml) { next if exists $cache_of_already_read_files{$filename}; $cache_of_already_read_files{$filename} = 'found at'.scalar (local +time(time)); &process_xml_file($filename); } &clean_cache_from_older_filenames(\%cache_of_already_read_files, \@xml +); &save_cache_somewhere(\%cache_of_already_read_files);