One of the suggestions I've picked up is the notion Belief propagation. I'm not entirely sure how well it'll apply, but it's something I'll look into.
I'm looking at a node distributed solution for bulk filesystem traversal and processing. NFS make this easier on one hand, but harder on the other. I've got a lot of spindles and controllers behind the problem though, especially if I'm able to make it quiesce during peak times. (Which is no small part of why I'm trying to do 'clever stuff' with it - virus scanning 100k files per hour or so, is going to take me nearly a year if I'm doing 2 billion of the blasted things. (but that might be acceptable if I can then treat it as a baseline, and do incremental sweeps thereafter).
I think however I slice the problem, it's still going to be big and chewy.