|Pathologically Eclectic Rubbish Lister|
Re^5: Splitting up a filesystem into 'bite sized' chunksby BrowserUk (Pope)
|on Jul 10, 2013 at 20:58 UTC||Need Help??|
I was thinking I could stall the find process, in order to simply buffet, rather than maintain
. Well, any full lists, be they database or flat file.
The problem with flat files is that the make lousy queues. (Great filos but lousy fifos.)
Removing records/lines at the beginning of a file is (for all intents and purposes) impossible; and marking records done, means reading from the top each time to find the next piece of work to do. An O(n^2) process.
Thus you would then need a second (pointer) file that tells you how far down the first file you've processed; and that file becomes a bottleneck of contention.
As for file systems...I've often used (and advocated the use of) file systems for queues -- the producer creates small (often zero-length) files in a todo directory; consumers rename the first file they find in that directory into a /consumerN.processing/ directory whilst they process it; and then rename it into a done directory (or just delete it) once they finished. -- but again, given the size of your dataset, you'd have to very carefully manage the number of files you put into a single directory. And if you try to structure it, you're just moving the goal posts.
And what happpens if your find/findfile process dies? Working out how far it got so you can avoid starting over is a problem.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.