by Anonymous Monk
Create tied hashes for caching that information so that I don't have to hit the database everytime I need the id of some frequenly used term.
Did you benchmark this? Repeatedly asking the database for the same thing might not be so bad if your database is good in caching. But tied hashes in Perl are slow. There are many factors involved, and what's best will vary from setup to setup, but don't dismiss something for tied hashes too easily if it's performance you care about.

Of course, this has nothing to do with your memory problem.

Re^4: Tracking down memory leaks
    While it isn't related to the memory problem, we did think about this. The thing is, for each time through the loop, we have to hit the database about 10 times to obtain ids for a relatively small number of possible items. We are trying to eliminate the overhead of just hitting the database, not waiting for the query to finish, as it is very fast. It is that overhead that takes a while (comparatively). That overhead compared to a small BerkeleyDB database should favor BerkeleyDB.

    What I will probably do after I get these memory issues out of the way is offer as a command line option to either use in memory hashes or tied hashes, depending on the size of the file.

