Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

storing tree-like structures in tables

by schweini (Friar)
on Sep 11, 2002 at 02:09 UTC ( #196850=perlquestion: print w/ replies, xml ) Need Help??
schweini has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, i'm seeking enlightment regarding the following:

i have items ("leaves") that can belong to one or more groups ("branches"), and every group can be in 0 or more groups.

my question is, how can i store this kind of data in a (mysql) table? what i'm doing right now, is simply using a "relationships" table, where every row either has a "TOP" or the corresponding row-id in its "son_of" column.
this is of course not quite optimal, if i have to re-contruct the whole branch leading to a given row, since i'd have to do e.g. 5 SELECTS for the re-construction of the branch of a leaf that has a depth of 5 (according to my very limited SQL-skills).

i could maybe store serialized objects or xml in my database, but really rather wouldn't because i need speed and portability.

thanks,

schweini

Comment on storing tree-like structures in tables
Re: storing tree-like structures in tables
by thraxil (Prior) on Sep 11, 2002 at 02:51 UTC

    the approach you seem to already be using of having a column for the parent is pretty standard. obviously it's not optimal if you have very deep trees. the other typical solution is to use an object-relational database like postgresql, but once you start using those features, there goes portability.

    if you're really careful, you might be able to hack something clever. if your rows each have a unique key, you could maybe add a 'branch', or 'path' column that you populate with a string containing the keys for each layer going down to the row seperated by commas or something. eg, if row 123 is a child of 56 which is a child of 34 which is a child of 12, you could store '12/34/56' in the branch column for row 123. then, after you select, you just split it in perl and off you go. of course, this could turn into a nightmare to maintain and you should avoid doing it unless the performance hit of multiple selects is really the bottleneck.

    this might also be a good time to think about alternatives to trees.

    anders pearson

Re: storing tree-like structures in tables
by dws (Chancellor) on Sep 11, 2002 at 06:14 UTC
    i could maybe store serialized objects or xml in my database, but really rather wouldn't because i need speed and portability.

    Serializing a tree and storing it in a BLOB (Binary Large OBject) field is a lot faster than decomposing the tree into separate nodes and storing each (and then having to issue a bunch of SELECTs to reconstitute the tree).

    If you're using MySQL and Perl, portability shouldn't be an issue. The only issue is which method you use to serialize objects. See storing serialized object in a database for an earlier discussion of alternatives.

Re: storing tree-like structures in tables
by abell (Chaplain) on Sep 11, 2002 at 08:08 UTC
    First I'd like to point out that as I understand your structure it is not a real tree, because (unless you forgot to mention this restriction) two different groups can belong to a same parent group.
    I had some thoughts on similar problems in an access-control setting, so let me use this language and call an item "user" and a group "group" (not much of a change, is it?).
    A user can belong to one or more groups and each group can belong to (and have the access rights of) one or more groups. Say antonio is in group <students> and in group <football players>. Both <students> and <football players> belong to <people>, and <people> belongs to the <universal> group. If the problem is getting all the groups antonio belongs to, you could first look for all the groups it directly belongs to and then climb on through the hierarchy. Thus, you get
    antonio -> <students>, <football players>
    
    at the next step
    antonio -> <students>, <football players>, <people>
    
    (you only consider <people> once), and finally
    antonio -> <students>, <football players>, <people>, <universal>
    

    Steps can proliferate if you have deep levels of nesting.
    To improve performance during the lookup, I would create the db this way:
    users:
    ------------------------------
    | ID | name      | fields... |
    ------------------------------
    |  1 | antonio   |           |
         ...
    |________________|___________|
    
    groups:
    -------------------------------------
    | ID | name             | fields... |
    -------------------------------------
    |  1 | students         |           |
    |  2 | football players |           |
    |  3 | people           |           |
    |  4 | universal        |           |
         ...
    |____|__________________|___________|
    
    user_group_rel (user belongs to group)
    ----------------------
    | ID_user | ID_group |
    ----------------------
    | 1       | 1        |
    | 1       | 2        |
        ...
    |_________|__________|
    
    group_group_rel (group inherits from group)
    -------------------------------
    | ID1  | ID2 ( parent group ) |
    -------------------------------
    | 1    | 1                    |
    | 1    | 3                    |
    | 1    | 4                    |
    | 2    | 2                    |
    | 2    | 3                    |
    | 2    | 4                    |
    | 3    | 3                    |
    | 3    | 4                    |
          ...
    |______|______________________|
    
    Note that the group-group relationship contains all rows (n,n) and all possible group inclusions, including derived ones (e.g. <students>-><universal>).
    This requires more rows to be generated, but then you can retrieve all groups a user belongs to by the one query
    SELECT UNIQUE group_group_rel.ID2
      FROM group_group_rel, user_group_rel
      WHERE user_group_rel.ID_user=?
        AND user_group_rel.ID_group=group_group_rel.ID1
    
    If you want to keep track of what group inclusions are given and what are derived, you could add a boolean column in the group_group_rel table.

    Best regards

    Antonio Bellezza

    Update: Added explanatory remark in italics.
Re: storing tree-like structures in tables
by blssu (Pilgrim) on Sep 11, 2002 at 11:38 UTC

    Good comments above. I agree that your data model is the common approach. I disagree that serializing a nested hash is a good general solution. Here are my personal experiences with a very large database holding file-system-like and user-group data.

    • Some databases (Oracle, others?) directly support tree structures. Oracle uses a "connect by" clause, so you can query an entire tree with one query:

      SELECT * FROM item CONNECT BY parent = PRIOR id START WITH id = 0
      I don't know if MySQL supports this -- but you might consider requesting/contributing it because it is a very useful feature.

    • I frequently use additional columns to identify sub-trees. Think of a military or government organization with a deep hierarchy. Certain levels are explicit and others are ad hoc. You can model the entire structure with the "parent node" pointers. Then annotate this with "branch", "division", "company", etc. pointers. Every row in your table will have a column for each of those pointers.

    • Those explicit level pointers can be stored in another table. Then you can have one table for any number of explicit levels -- each row has "level name", "member row id". This is useful for handling webs (graphs instead of trees) of relationships. It can also be used to modify an existing database if you need to solve a tree expansion performance problem, but don't want to touch the core data model.

    • For speed, I sometimes cache a flattened version of a tree. This is especially useful for user group membership tests when the groups can be nested. When a person logs on, I do an expensive tree expansion of the group table (filtered for that person) and insert the results into a temporary session table. The session table can now be used to join against other tables for security checks.

Re: storing tree-like structures in tables
by Anonymous Monk on Sep 11, 2002 at 14:30 UTC
    Joe Celko routinely recommends a nested set construction for this problem. Here is an explanation. Or search for more articles on it.

      ++ on this. The nested set construction is a really useful technique for representing trees in SQL that I've used several times in the past.

      There is now a nice perl introduction to this way of representing trees over on oreillynet - just in case anybody's still interested :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://196850]
Approved by cacharbe
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-12-28 04:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls