Actually, the recursion is not an issue at all, as far as I can see. The depth to which any recursive call will go is extremely small, except the first time a parent tree is built. After that recursion only happens once per entry (which would simply have to be another function call to a routine doing almost the exact same thing as add_entry). I was curious about the recursion as well, suspecting a bug leading to full-depth recursion on every request, but profiler output confirms that depth of calls is not an issue here. I experimented some with taking out the escape clauses, and recursion depth shows up very clearly in the dprofpp -T output--with the routine as it exists here, the average depth does not grow measurably deeper as the program progresses. So we're definitely not 'in up to the axles' in recursion...it would show in an ever increasing depth when profiling.
in reply to Re: Performance quandary
in thread Performance quandary
In reply to, "As I understand it, you want to index a collection of URI's by an md5 digest"...We're indexing objects that already have an md5 digest (we get it from Squid, which indexes its object store by md5). This is a requirement of the database--it can't work without being indexed by md5s that match the Squid md5 index exactly, because otherwise we wouldn't be able to remove the item from our database when Squid RELEASES it. The digest I'm generating is for a parent object, which may or may not exist in the Squid object store--but we have to have it, because we need to be able to pull a complete object tree by requesting, for example "http://www.perlmonks.org". Our DB has to be able to find all of the child objects of www.perlmonks.org.
Now I don't actually understand your pseudo-code example (what is make_updateVectors?), but I suspect you're missing the critical purpose of the recursive call in this function: We need a complete tree in the database, all the way up to the root node of this URL. We must be able to make a request of this database for an object and all of its children. However, when building the database and when adding new objects when in daemon mode, we have no guarantee that all of the parent objects will exist in the database.
So, perhaps I'm wrong, but it looks like you've created a function to simply add a single entry (plus possibly a single parent?) to the db. Now it may be entirely true that indexing both the object and its relationship to other objects in the tree (or its child objects anyway) is not the most efficient use of the database, but I don't see that using recursion is fundamentally flawed here or that an iterative solution would make a major performance difference. The usual recursive depth is one level (for the parent entry update)...that isn't significant overhead, is it? I don't see how it could be, unless Perl has a tremendous recursion overhead, which I have never seen mentioned anywhere.
Am I missing something from this picture that is misleading me into thinking an iterative solution would alter the runtime very little, if at all? We still have to construct the parent tree, whether we do it recursively or iteratively--the number of function calls and db writes doesn't change, one way or another. Recursion seems the right way to do it to me, since we've already got a routine that does everything we need for every object up the tree...an iterative solution would require a repeat of most of the same code in add_entry within the block that iterates (I actually started writing it that way when I started, but
soon realized I needed all of the same actions for the parent entries that I need for the current entry).