Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^6: Splitting up a filesystem into 'bite sized' chunks

by Preceptor (Chaplain)
on Jul 10, 2013 at 21:16 UTC ( #1043568=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Splitting up a filesystem into 'bite sized' chunks
in thread Splitting up a filesystem into 'bite sized' chunks

My line of thinking there is that making a note of which subdirectory I had got to, in a checkpoint every so often, and combining it with File::Find::prune to "skip forwards". I suppose I'm not really sure why I'm resisting databases, though.


Comment on Re^6: Splitting up a filesystem into 'bite sized' chunks
Re^7: Splitting up a filesystem into 'bite sized' chunks
by BrowserUk (Pope) on Jul 10, 2013 at 21:46 UTC
    and combining it with File::Find::prune to "skip forwards".

    I have no experience (nor knowledge even) of that, so I cannot comment on it.

    I'm not really sure why I'm resisting databases,

    It wouldn't have to be (nor benefit from) being a full RDBMS, but it would need to be able to handle low levels of read contention and a concurrent writer.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      File::Find let's you specify "prune" which enables or disables traversal. If you know your last checkpoint is /mnt/myhome/stuff/junk you can pattern match your file path, and turn off traversal until you get a match. (you may have to roll up your checkpoint a little if the target has been deleted in the interim).

      That'll - hopefully - give me a restartable find. Being able to "skip head" in future (and thus distribute processing) may require a first pass, and tracking multiple checkpoints.

      Thinking about it, some sort of start/finish and some way of compensating for "drift". But one checkpoint every 100k files takes a huge list down to merely large. (but still doesn't help your first pass, unless you can take some wild guesses for initial checkpoints and do that same drift compensation.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1043568]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2014-07-22 18:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (126 votes), past polls