Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: randomising file order returned by File::Find

by salva (Monsignor)
on Mar 01, 2011 at 12:19 UTC ( #890741=note: print w/ replies, xml ) Need Help??


in reply to randomising file order returned by File::Find

  1. Push the filenames into a list
  2. shuffle the list
  3. process the elements on the list.
use File::Find; use List::Util qw(shuffle); my @files; find({wanted => sub { push @files, $File::Find::name }, follow => 1}, +$somedir); @files = shuffle @files; process_file($_) for @files;


Comment on Re: randomising file order returned by File::Find
Download Code
Re^2: randomising file order returned by File::Find
by Anonymous Monk on Mar 01, 2011 at 13:17 UTC
    Thank you very much indeed! This is what I was after and so simple when you think about it! Greg.

    PS to the others making suggestions about how to schedule multiple jobs etc, your suggestions are good in theory but the problem is more complicated than I outlined and pre-scanning at the beginning does not work in practice (for reasons including load-balancing due to different processing time (as mentioned) , the fact that the files that are available to process can change, that I often have to kill off jobs when other people want to use the cluster, etc, etc).

      pre-scanning at the beginning does not work in practice (for reasons including load-balancing due to different processing time (as mentioned) , the fact that the files that are available to process can change, that I often have to kill off jobs when other people want to use the cluster, etc, etc).

      Another advantage of the file scanner server idea is that if you need to pause the 100+ processing clients, you only need instruct the server to stop responding to requests. Then those clients stop as soon as they've finished with their current file and just sit dormant waiting for a response.

      When the cluster is free again, another single instruction to the server and they all kick off again, continuing their way down the list without any possibility of revisiting files already processed. Every file gets processed exactly once with none missed and no time wasted locking files and no possibility of race conditions.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://890741]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2014-08-02 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (53 votes), past polls