|laziness, impatience, and hubris|
In a recent post Berkeley DB performance, profiling, and degradation..., I posed a query about database performance with the Berkeley DB and DB_File (and by the end of the thread, BerkeleyDB and MLDBM::Sync). I think performance of the database portion of my code is as fast it is going to get. But I'm still shy of the mark that my code absolutely must hit to work. So, out of a stubborn belief that this sort of code shouldn't have to be written in C/C++ to work, I'm going to keep on beating on it, I hope with a little help.
The most expensive, and third most repeated call in my program is called add_entry. In profile output, 105,000 entries takes roughly 571 seconds of exclusive time and 1450 cumulative seconds--obviously it isn't the only time user, but it is the most guilty party now that I've tuned the db access as much as seems possible. I've already optimized for the most used path, and eliminated all of the extra database fetches and unnecessary variable creations that I could find. So now I can't spot anything left to eliminate or tune, but I know there are a thousand or more monks here who know more than I about these things.
So, without further ado, I give you the code:
The common case is the first one, where an object is updated with new children (this happens roughly twice as often as a new entry in my test data set). The we do a short circuit check to see if we were called without children, meaning that the object doesn't exist already--so create a new object and add an entry to the parent to point to it.
After the children check, we proceed with figuring out whether we can add a parent (some objects can't have a parent because they are already the root object, for example). If a parent can happen we call add_entry again with the parent data. I was using a join on the first parent add_entry call to begin with (joining the $md5 with the old $parentchildren with a colon), but I kind of suspect a fixed string template version is faster.
So, am I missing something in this routine that needs tweaking? Can I reduce, refactor, rewrite any of this routine to make it more efficient? Memory efficiency is literally of zero value here--we have tons of memory, but very little time in which to handle a lot of data (15 entries per second is the absolute minimum, while sharing the CPU/disk with an extremely power hungry daemon). Right now, by the time the index reaches a half million entries, every new entry costs more than an eighth of a second (sometimes a lot more). So I've got to shave about a quarter second off of each entry, or rewrite this thing in C/C++. Am I doomed to writing and maintaining a low level language app?
Thanks for any pointers anyone can offer up.