I was thinking I could stall the find process, in order to simply buffet, rather than maintain
- Processes, and hardware do fail. Given the length of time this whole process is likely to take, it woudl be silly to risk getting to 90% and then have to start over because you ignored this possibility.
- Given the size of your dataset, you'd have to carefully manage the size of your queue to avoid running out of memory.
. Well, any full lists, be they database or flat file.
The problem with flat files is that the make lousy queues. (Great filos but lousy fifos.)
Removing records/lines at the beginning of a file is (for all intents and purposes) impossible; and marking records done, means reading from the top each time to find the next piece of work to do. An O(n^2) process.
Thus you would then need a second (pointer) file that tells you how far down the first file you've processed; and that file becomes a bottleneck of contention.
As for file systems...I've often used (and advocated the use of) file systems for queues -- the producer creates small (often zero-length) files in a todo directory; consumers rename the first file they find in that directory into a /consumerN.processing/ directory whilst they process it; and then rename it into a done directory (or just delete it) once they finished. -- but again, given the size of your dataset, you'd have to very carefully manage the number of files you put into a single directory. And if you try to structure it, you're just moving the goal posts.
And what happpens if your find/findfile process dies? Working out how far it got so you can avoid starting over is a problem.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |
My line of thinking there is that making a note of which subdirectory I had got to, in a checkpoint every so often, and combining it with File::Find::prune to "skip forwards". I suppose I'm not really sure why I'm resisting databases, though.
| [reply] [Watch: Dir/Any] |
and combining it with File::Find::prune to "skip forwards".
I have no experience (nor knowledge even) of that, so I cannot comment on it.
I'm not really sure why I'm resisting databases,
It wouldn't have to be (nor benefit from) being a full RDBMS, but it would need to be able to handle low levels of read contention and a concurrent writer.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |