Remember, the work being done here is I/O-bound, not CPU bound. This means that the processor is spending nearly all of its time getting the next I/O-operation started, then everyone goes to sleep again until the next I/O is complete. Therefore, CPU utilization would not be expected to be a useful bellwether of how much work is now being done; or could potentially be done. If anything is to get saturated, it’s most likely to be I/O capacity. (Unless you are blowing memory with some too-big-for-its-britches hash and thus dropping into thrashing-hell; dunno.)
Perhaps you could consider writing this program so that it simply is given a directory-name as an input parameter and it munches through that directory and its subs, doing its thing, then writes the completed output (say...) to a shared SQLite database file. Now, the job can be done, appropriate to each machine and to the changing workload, simply by launching multiple copies of the program simultaneously from the command-line with different parameters. This would achieve the same goal ... of exploiting parallelism ... but with considerable reduction of internal complexity and handing more influence back to the user. A command-line parameter to “consider only newer files,” etc., might be useful options.