|Problems? Is your data what you think it is?|
Re: Re (tilly) 1: Performance quandaryby SwellJoe (Scribe)
|on Feb 24, 2002 at 04:12 UTC||Need Help??|
Thanks for your thoughts, Tilly.
BTree gave me about 5% (already tried it a couple of times with both DB_File and BerkeleyDB). The current system is using BTree (I should update the previous database post to show the most recent numbers and specific configuration choices).
I think you might have tapped into something with the notion of a very simple write (except a lot more of them) rather than a pull->parse->add->write on the parent each time.
My reason for choosing the data structure I have, is that from a single parent I must be able to quickly poll through all of its children and subchildren. The key requirement for the parent->child relationship is that from any parent, all of its children can be found. The child doesn't need to store its parent, because that can be generated from what we know of the child (the URL--find_parent already does this in a ~two line function).
That said, I think you're probably right about removing the requirement for pulling and pushing large objects. Though the objects don't grow as much as the real world behavior indicates they do. Anyway, I won't know until I try it, so I'm going to try to figure out a database structure that will permit this kind of relationship without requiring the parent to store everything about its immediate kids.
It seems I'm going to need two entries per object to account for the 'any child can be a parent to other objects' paradigm I'm dealing with. So $parent_mdb|$kid_no will store the object info, while the $kid_mdb will store its child info, plus the parent key so the first object can be removed when this one is. I think this is necessary since we need to be able to seek to any object...I suppose I could, in the seek code use find_parent to seek up the tree until the parent is located and then poll back down to find the object. More efficient to have two entries, I presume?
I guess I'll just go try it both ways and see which one makes me wait the longest. I'll give DBI and DBD::SQLite a perusal as well. Will be interesting to see what works best. Results to follow...