Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Trees in SQL, adjacency lists and sorting.

by BUU (Prior)
on Jan 06, 2006 at 04:17 UTC ( [id://521407]=note: print w/replies, xml ) Need Help??


in reply to Re: Trees in SQL, adjacency lists and sorting.
in thread Trees in SQL, adjacency lists and sorting.

So what's the Right Way to store a tree? You seem to have fairly strong opinions on the matter and I'm always willing to learn. As for the every single depth bit, if I delete a node I can update all its children in a single query.
  • Comment on Re^2: Trees in SQL, adjacency lists and sorting.

Replies are listed 'Best First'.
Re^3: Trees in SQL, adjacency lists and sorting.
by dragonchild (Archbishop) on Jan 06, 2006 at 14:18 UTC
    Data structures exist to organize data in ways that are useful for what we're going to do with it. For example, if all you're doing is storing a HoHoHoHoH that you want to walk but never search, then a database is an extremely poor solution and something like DBM::Deep is preferable. If you have a bunch of items that are on the same level and you want to do random-access searches based on arbitrary criteria, then a relational database is much better than DBM::Deep.

    How does this relate to trees? You have a perfectly fine way to store a tree in a database . . . depending on what you're trying to do with it. You're using a variant of the self-referential method which stores every single node-to-node relationship, not just the immediate ones. It looks like it would be extremely fast at finding all descendents and all ancestors of a given node, doing each one in one query. However, it will be no better than the standard self-referential method in finding all siblings of a given node.

    It sounds like you really want to use the nested-sets method. It's where you don't store a reference to your parent node, but instead store a left and right bound which indicates where in the nested sets you stand. To find all your ancestors, you find all nodes where "my_id BETWEEN left AND right". To find all your descendents, you find all nodes where "id BETWEEN my_left AND my_right". It's no better than the standard self-referential in finding all siblings.

    Both methods (yours and self-referential), it sounds, have near-equal benefits when it comes to speeding up searches. Yours is slightly better when doing inserts and updates at the cost of significantly more disk, but worse when doing deletes. That may be an acceptable tradeoff depending on your intended usage. For example, if your spec says you will never delete a node except in rare situations, this is probably a good direction to look at.

    As for the every single depth bit, if I delete a node I can update all its children in a single query.

    I don't think so.

    Personally, I'm not sold on the depth thing. I think that storing every node-to-node relationship is definitely a solid strategy. I don't know what having the depth gets you. Personally, I'd use this as an adjunct table to the standard self-referential. Yes, you'd have duplicated information between the main table and the relationships table (the direct parent-child relationship would be duplicated), but if that's the optimization I need, then that's fine because the performance cost of the loss of normalization is made up by the speed gains I get with the caching.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://521407]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-19 23:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found