While it isn't related to the memory problem, we did think about this. The thing is, for each time through the loop, we have to hit the database about 10 times to obtain ids for a relatively small number of possible items. We are trying to eliminate the overhead of just hitting the database, not waiting for the query to finish, as it is very fast. It is that overhead that takes a while (comparatively). That overhead compared to a small BerkeleyDB database should favor BerkeleyDB.
What I will probably do after I get these memory issues out of the way is offer as a command line option to either use in memory hashes or tied hashes, depending on the size of the file.