Just a comment/question on what you mentioned here:
- Memoize, especially to memoize the output of the following one,
- Digest::MD5, to sort through heaps of archived data and figure out what was stored twice under different names
I'm just wondering what benefit Memoize provides in this context. It seems from your brief description that you are doing MD5s of the archived data so that you don't have to keep all that data in memory (just an MD5 hash of the data). This will make it easy to find duplicates and won't take up much memory. But my memoizing it, you are still keeping all the archived data in memory, and you are keeping the MD5 hash in memory as well. You might as well just store the data itself in a hash and do a straight comparison on it saving the time required to do an MD5 hash on it.
I'm curious to know if I am blatantly missing something here, or missunderstanding the usefulness of Memoize in this context.
By the way, I think Memoize is a great module, but I don't think there are many situations where it is actually beneficial.