Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Tree Structure and Db

by InfiniteLoop (Hermit)
on Jul 05, 2005 at 21:13 UTC ( #472604=perlquestion: print w/ replies, xml ) Need Help??
InfiniteLoop has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks,

Recently I have been asked to design a perl module (backed by a database) to provide access to a data, which is essentially of a tree structure type. For example, the data looks like:

Parent Node1/ Child Node 1/ Sub Child Node 1/ Leaf1
My database table has the following columns & data:
id | node_name | parent_id 1 | Parent Node1 | 0 2 | Child Node1 | 1 3 | Sub Child Node1 | 2 4 | Leaf1 | 3
where:

id is autogenerated primary key
parent_id is the "id" of the parent row
node_name is the name of the node.

Apart from the regular new(), node_name(), save() and delete() functions, I plan to provide a parent(), which would query the database and return the parent of the given node (as an object.)

The class will be used to list leaf/node(s), for a given node and also display some sort of heirarchial path (or bread crumb) to the given leaf (by appending the parent's node_name() till parent() returns null)

I seek your advice for:
1. is there an optimal way to design a database table to represent an tree structure ?
2. can I provide a better design to access the heirarchial path without incurring a performance penalty ?

Comment on Tree Structure and Db
Select or Download Code
Re: Tree Structure and Db
by ikegami (Pope) on Jul 05, 2005 at 21:31 UTC

    If you're primarily going to read the tree, it might be advantageous to store pointers to every ancestor instead of just the direct parent. You can query for all the ancestors in one query, and then build the path in Perl. For example,

    Nodes ===== id | node_name | parent ----+------------------+-------- 1 | Parent Node1 | 0 2 | Child Node 1 | 1 3 | Sub Child Node 1 | 2 4 | Leaf1 | 3 HasAncestor =========== child | ancestor -------+---------- 2 | 1 3 | 2 3 | 1 4 | 3 4 | 2 4 | 1 SELECT Nodes.* FROM Nodes LEFT JOIN HasAncestor ON Nodes.id = HasAncestor.child WHERE Nodes.id = ? Returns for args (3) ==================== id | node_name | parent ----+------------------+-------- 1 | Parent Node1 | 0 2 | Child Node 1 | 1 3 | Sub Child Node 1 | 2

    This way, you can easily fetch an entire subtree. For example,

    SELECT Nodes.* FROM Nodes LEFT JOIN HasAncestor ON Nodes.id = HasAncestor.child WHERE HasAncestor.ancestor = ? Returns for args (2) ==================== id | node_name | parent ----+------------------+-------- 3 | Sub Child Node 1 | 2 4 | Leaf1 | 3

    You can still fetch only the immediate children. For example,

    SELECT Nodes.* FROM Nodes WHERE Nodes.parent = ? Returns for args (2) ==================== id | node_name | parent ----+------------------+-------- 3 | Sub Child Node 1 | 2

    Similar for the immediate parent.

      Thnx ikegami. I did think of a lookup table, but was the opposite of your design. This is a great help, thnx again.

      As the author stated, this structure is good for trees that are mostly static.

      In case the tree can be updated, performance becomes an issue. Even a simple operation as to insert a leaf node, could cause you to insert multiple rows (average number of rows inserted equals average depth of all pathes, and in the worst case, the number of rows get inserted is the length of the deepest path.)

      Look at one more operation: to move a subtree to be under a new parent. if you only store the immediate parent-child relationship, you only need to modify 1 row, but if you store all ancestor, you will need to modify multiple rows for each node in the subtree (basically to modify all relationships towards a node that is above the root of the subtree.)

      Data integraty could also be an issue, a simple coding mistake can spread dirty data all over the place, not just an isolated spot.

        Thanks for expanding on this. I intended to do so, but didn't get get a chance 'til now. That is exactly what I meant by my comment.
Re: Tree Structure and Db
by dbwiz (Curate) on Jul 05, 2005 at 21:32 UTC
Re: Tree Structure and Db
by sharkey (Scribe) on Jul 06, 2005 at 02:08 UTC
    This probably won't help you, but...

    If you happen to be using Oracle or DB2, those databases have SQL extensions to do tree structured queries, based on a table like you describe.

    The syntaxes are radically different, so your best bet is to look it up in your database documentation. For Oracle the keyword is "CONNECT BY", and for DB2 the keyword is "WITH".

    Here's a nice overview: http://www.oreilly.com/catalog/sqlpr/chapter/ch01.pdf

      CONNECT BY is not working very well in oracle 9.2 - in oracle 8.X its OK. This is a bug in oracle 9.2 (also admited by oracle ;)) -
      What kind of DB are you using?
      Best regards
      Hartwig
        I use MySQL 4.x
Re: Tree Structure and Db
by devnul (Monk) on Jul 06, 2005 at 04:19 UTC
    ... and if you are using Postgres there is ltree which works quite well

    - dEvNuL
Re: Tree Structure and Db
by vyach (Novice) on Jul 06, 2005 at 07:21 UTC
    If your mind is open and you don't fear to change the usual way, you can try something different then relational database. I suppose that XML is more appropriate for hierarcic tree-like structures. You can use XPath for querying them - it was designed exactly for that. Look at the Berkeley DB XML. But, frankly, I had no time to test it with Perl.
Re: Tree Structure and Db
by DrHyde (Prior) on Jul 06, 2005 at 09:39 UTC
    Take a look at DBM::Deep.

    The source code for that module is a good read too.

Re: Tree Structure and Db
by SimonClinch (Chaplain) on Jul 06, 2005 at 13:35 UTC
    1) The standard solution is to use a link table from the main table to itself. The link table contains two foreign keys from the same main table, i.e. the main table has TWO one-to-many relationships to the link table, one for parent relation and the other for child relation.

    2) But to get performance out of this, it is best to write some access stored procedures (and in some cases perhaps views) which include a GetChild and a GetParent along with any procedures or triggers for insert, update and delete that may be required to simplify and unify access, but which otherwise hide (or rather make it unnecessary to expose) the link-table implementation to the database user ("user" includes any perl code that transacts with it).

    One world, one people

      In a tree-wise data structure, with only one parent per child, what is the perceived advantage of using this linking table rather than just adding the parent ID to the main table?
        A link table is the normal way to enforce referential integrity in many to many relationships (0 or 1 counts as many for these purposes) and in this case prevents orphans; it also enables you to define different types of relationship without putting more illegal or awkwardly-implemented constraints (and adding maybe-null foreign keys for them) on the master table.

        One world, one people

Re: Tree Structure and Db
by bart (Canon) on Jul 06, 2005 at 16:46 UTC
    1. is there an optimal way to design a database table to represent an tree structure ?
    I think there's a good candidate. I have to thank demerphq for drawing my attention to it, a few months ago. Look up Joe Celko's article trees in SQL. There's several copies of this (and a follow-up) article floating around on the internet — some even with user comments; he also wrote an entire book about it.
Re: Tree Structure and Db
by Argel (Prior) on Jul 06, 2005 at 23:00 UTC
    . . . is there an optimal way to design a database table to represent an tree structure ?

    You are storing a tree -- which is a hierarchical data structure -- so why not use a hierarchical database such as OpenLDAP or SUN's Directory Server 5.2? Both seem like a good fit for what you are trying to do (though you would have to learn all the jargon, etc.).

    I've played with SUN's Directory Server 5.2 a bit but you would probably want to go with OpenLDAP since it is free. There is Net::LDAP on CPAN but if performance is a serious concern you should look into something tailored specificaly to the LDAP server you are using. SUN has perldap (which you can get as part of the directory server resource kit). I have not used OpenLDAP so I am not sure what is availalbe for it.

    It might be easier to go with a relational database that you are familiar with instead. But since no one else had mentioned it I thought I would toss the LDAP idea out there for consideration.

    -- Argel

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://472604]
Approved by marto
Front-paged by Caron
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2014-09-23 00:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (208 votes), past polls