|Problems? Is your data what you think it is?|
pre-scanning at the beginning does not work in practice (for reasons including load-balancing due to different processing time (as mentioned) , the fact that the files that are available to process can change, that I often have to kill off jobs when other people want to use the cluster, etc, etc).
Another advantage of the file scanner server idea is that if you need to pause the 100+ processing clients, you only need instruct the server to stop responding to requests. Then those clients stop as soon as they've finished with their current file and just sit dormant waiting for a response.
When the cluster is free again, another single instruction to the server and they all kick off again, continuing their way down the list without any possibility of revisiting files already processed. Every file gets processed exactly once with none missed and no time wasted locking files and no possibility of race conditions.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
In reply to Re^3: randomising file order returned by File::Find