|Perl: the Markov chain saw|
This is an interesting thought, Anonymous.
I'm cautious of reinventing the wheel here, because I have very little wheel building experience. The Berkeley DB ought to be able to do what I want--assuming I use it correctly, and after tilly's pointers and tips, I suspect I'm using it a bit incorrectly (or just expecting magic where there is none). At this point my plan is to rethink the way I interact with the database to reduce the size of the objects stored and avoid "exists?read->parse->modify->join->write" for every parent entry. I believe most of my problem is with the parenthandling, in that each parent can potentially grow rather large and gets modified a lot.
Using tilly's ideas (and a couple that were pointed out to me in a chatterbox session, also with tilly) I'm going to use a database structure something like this:
$parentmd5 == $url $exists $number_of_children
$parentmd5:childnumber == $md5-pointer
$childmd5 == $url $exists $number_of_children
With this structure I will still have to update the parent entry to increment the number_of_children variable, but the entry will keep a fixed size (I suspect constant resizing is a hindrence) and an increment is cheaper than checking to see if the child is already in the list and appending it to a list of entries. I still have to address the check to see if the child already is in the list--which is where all of my troubles are coming in, as I don't know how to check to see if the child is already among the 'number_of_children' without polling through every child!
So, it all comes down to this: My biggest quandary is how to handle the parent->child relationship efficiently....