|laziness, impatience, and hubris|
Re^3: Splitting up a filesystem into 'bite sized' chunksby BrowserUk (Pope)
|on Jul 10, 2013 at 19:17 UTC||Need Help??|
I'm working on something that uses File::Find to send file lists to another thread (or two) that's using Thread::Queue. My major requirement is breaking down a 10Tb, 70million file monster filesystem
Given the size of your dataset, using an in-memory queue is a fatally flawed plan from both memory consumption and persistance/re-startability point of views.
I'd strongly advocate putting your file-paths into a DB of some kind and have your scanning processes remove them (or mark them done) as the processes them.
That way, if any one element of the cluster fails, it can be restarted and pick up from where it left off.
It also lends itself to doing incremental scans in subsequent passes.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.