in reply to
how to merge many files of sorted hashes?
Sometimes, when I have “hairy things referring to other hairy things,” such as a matrix referring to another matrix, I find it useful to introduce the concept of surrogate keys. This sort-of goes back to, “if I had to store all this stuff in an old-fashioned library card catalog, which drawer would I put it in and why?” First of all, I would assign every matrix that I had (no matter how I intended to use it) a random unique identifier such as a UUID. Then, I would produce some kind of arithmetic hash-value that would allow me to sift through all the data that I had in order to find it faster. Something simple, like the sum of every number in the list after truncating that number to an integer. I’d tag the information with that value, and then, look only for that tag. (The original notion of “hashing.”)
You don’t necessarily have to put those hundreds of megabytes of data into a database. (SQLite, by the way, is an excellent tool for this.) You just need to find a way to use a database file to catalog it ... to tell you which file it’s in, and where. To reduce the amount of search time that you must spend to find a particular piece of information: not reducing that time to zero, just reducing it.