Re^2: Trees in SQL, adjacency lists and sorting.

Replies are listed 'Best First'.

Re^3: Trees in SQL, adjacency lists and sorting.
by dragonchild (Archbishop) on Jan 06, 2006 at 14:18 UTC

poor

DBM::Deep

How does this relate to trees? You have a perfectly fine way to store a tree in a database . . . depending on what you're trying to do with it. You're using a variant of the self-referential method which stores every single node-to-node relationship, not just the immediate ones. It looks like it would be extremely fast at finding all descendents and all ancestors of a given node, doing each one in one query. However, it will be no better than the standard self-referential method in finding all siblings of a given node.

It sounds like you really want to use the nested-sets method. It's where you don't store a reference to your parent node, but instead store a left and right bound which indicates where in the nested sets you stand. To find all your ancestors, you find all nodes where "my_id BETWEEN left AND right". To find all your descendents, you find all nodes where "id BETWEEN my_left AND my_right". It's no better than the standard self-referential in finding all siblings.

Both methods (yours and self-referential), it sounds, have near-equal benefits when it comes to speeding up searches. Yours is slightly better when doing inserts and updates at the cost of significantly more disk, but worse when doing deletes. That may be an acceptable tradeoff depending on your intended usage. For example, if your spec says you will never delete a node except in rare situations, this is probably a good direction to look at.

As for the every single depth bit, if I delete a node I can update all its children in a single query.

I don't think so.

Read more... (1150 Bytes)

Personally, I'm not sold on the depth thing. I think that storing every node-to-node relationship is definitely a solid strategy. I don't know what having the depth gets you. Personally, I'd use this as an adjunct table to the standard self-referential. Yes, you'd have duplicated information between the main table and the relationships table (the direct parent-child relationship would be duplicated), but if that's the optimization I need, then that's fine because the performance cost of the loss of normalization is made up by the speed gains I get with the caching.

My criteria for good software:

Does it work?
Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

[reply]
[d/l]
[select]


Keep It Simple, Stupid
	PerlMonks