|Perl: the Markov chain saw|
Can you provide a little more information about the contents of the Bench file, and what print_report() is doing?
From what I gather, the bench file is simply a list of absolute file paths on the filesystem (since you're using a find call to populate %today). What exactly are you trying to track?
Another question - have you confirmed your find command on your machine? On my box (redhat 9), that call to find (assuming $search_files is a scaler for a text match of some kind) would return every file on the filesystem. Are you sure you're getting the correct results?
Now that I think about it, I've got an idea on a general approach, assuming you've got access to the standard Unix utils - use sort, uniq, and diff, and parse the output of the diff. e.g.
By using the unix tools, you've now got the same output as you had after the call to _scan_system(). Note - diff will flag identical lines with different counts (that's what the -c option to uniq does) - you'd have to account for that when parsing the diff output.
This assumes, of course, that the real memory hog is %yesterday, before a pile of keys are deleted in building %today. If I'm wrong, and at the end of processing %yesterday and %today are both too big to handle by print_report(), you may well need to look at some kind of BerkeleyDB-type solution, but realize it's going to slow things down by a lot.
I hope this helps - sort/diff/uniq can be a great way to reduce the load on perl when processing large files.