Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: randomising file order returned by File::Find

by DrHyde (Prior)
on Mar 02, 2011 at 10:31 UTC ( #890942=note: print w/ replies, xml ) Need Help??


in reply to Re^2: randomising file order returned by File::Find
in thread randomising file order returned by File::Find

Trouble with this is that it doesn't make the best use of your hardware if you have machines that run at different speeds or if some data files take longer to process than others.

When I was trying to solve a similar problem (in my case, rendering individual frames of video), using whatever spare cycles were available across a whole bunch of machines (so different amounts of CPU were available on different boxes and at different times) my solution was for the individual renderers to request work units from a master, and rather than just mounting the master's filesystem and hoping for the best, they made a request to my own application. My application was a simple perl script that they accessed over telnet. The script was only working on its local filesystem so locking worked reliably, and simply told each client the filename that it should next work on. The clients then grabbed that file using NFS.

That's what I think you should do rather than randomising the list - randomising will reduce the problem, but won't eliminate it.

However, if you do want to randomise, then the wanted function should build up a list instead of doing any processing on the files. You then shuffle that list, and only after that do you process the files.


Comment on Re^3: randomising file order returned by File::Find
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://890942]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2014-07-10 18:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (215 votes), past polls