in reply to Re^4: Splitting up a filesystem into 'bite sized' chunks
in thread Splitting up a filesystem into 'bite sized' chunks
I was thinking I could stall the find process, in order to simply buffet, rather than maintain
- Processes, and hardware do fail. Given the length of time this whole process is likely to take, it woudl be silly to risk getting to 90% and then have to start over because you ignored this possibility.
- Given the size of your dataset, you'd have to carefully manage the size of your queue to avoid running out of memory.
. Well, any full lists, be they database or flat file.
The problem with flat files is that the make lousy queues. (Great filos but lousy fifos.)
Removing records/lines at the beginning of a file is (for all intents and purposes) impossible; and marking records done, means reading from the top each time to find the next piece of work to do. An O(n^2) process.
Thus you would then need a second (pointer) file that tells you how far down the first file you've processed; and that file becomes a bottleneck of contention.
As for file systems...I've often used (and advocated the use of) file systems for queues -- the producer creates small (often zero-length) files in a todo directory; consumers rename the first file they find in that directory into a /consumerN.processing/ directory whilst they process it; and then rename it into a done directory (or just delete it) once they finished. -- but again, given the size of your dataset, you'd have to very carefully manage the number of files you put into a single directory. And if you try to structure it, you're just moving the goal posts.
And what happpens if your find/findfile process dies? Working out how far it got so you can avoid starting over is a problem.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Splitting up a filesystem into 'bite sized' chunks
by Preceptor (Deacon) on Jul 10, 2013 at 21:16 UTC | |
by BrowserUk (Patriarch) on Jul 10, 2013 at 21:46 UTC | |
by Preceptor (Deacon) on Jul 10, 2013 at 22:59 UTC |