Raad has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I've been struggling with this problem for over a week now and I am about to admit failure at getting a working algorithm! I have a thesaurus of terms consisting of hierarchical tree structures which we use to index our library collection. My objective is to create a database like SQLite to store all existing entries (over 3,000). The thesaurus consists of a subject term and a unique identifier, a description and some other information. Example:

ID 211
DESCRIPTION Anatomical areas of the body.

ID 2
TREE A01.047
DESCRIPTION Portion of the body that lies between the
thorax and the pelvis.
Since I experienced some problems creating foreign keys in SQLite, I decided to store my data in a Hash of Array like:
%HoA =( ##id =>["term","parent_id","left_id", "right_id"] 1 => ["Body Regions","","",""], 2 => ["Extremities","1","",""], 3 => ["Arm","2","",""], 4 => ["Elbow","2","",""], 5 => ["Hand","2","",""], 6 => ["Fingers","5","",""], 7 => ["Thumbs","6","",""] );
The algorith for nested sets calls for a recursive technique to populate children nodes.
use strict;
my %HoA; my $k; my $root = 1; my $ctr =1; for $k (sort keys %HoA){ walktree($root) unless $k eq""; } sub walktree{ my $id = shift; $HoA{$k}[2]=$ctr++ if $id = $k; ##The walktree recursive sub below enters an eternal loop ##and doesn't execute properly. When I tried to push the ##children elements to a temporary array, it worked just ##fine. Would it be the $id??? walktree($id) if $id =$HoA{$k}[1] ; $HoA{$k}[3]=$ctr++; local $" = "|"; print "$k = @{$HoA{$k}}\n"; }

Could anybody direct me in the right direction? Or could you recommend an alternative algorithm using regex for instance? Any help would be appreciated.
I've looked at the ISO Thesaurus modules on CPAN but our data does not conform to this standard. I've also looked at some XML approaches but it was mind boggling, to say the least.
Thanks everybody...


Edited by Chady -- removed <br> tags from code.

Replies are listed 'Best First'.
Re: Tree Traversal Script/Nested Sets
by jZed (Prior) on Jul 13, 2004 at 20:26 UTC
Re: Tree Traversal Script/Nested Sets
by Hero Zzyzzx (Curate) on Jul 14, 2004 at 14:18 UTC

    I'm the author of DBIx::Tree:NestedSet. Out of the box it supports SQLite and MySQL, I'm looking for more database champions to beef up support.

    Below is one way you could set up your tree. I did it manually for the sake of verbosity.

    Note: I'm about to release a new version of the module (it's currently at 0.12), the most glaring bug it fixes is the broken return value of the various add_* methods.

    This module is pretty young, caveat coder. I'm very open to patches and bug fixes, (this module is important to a few apps I've got out there) I won't be abandoning it anytime soon.

    #!/usr/bin/perl use DBI; use DBIx::Tree::NestedSet; use Data::Dumper; use strict; my $dbh=DBI->connect('dbi:SQLite:dbname=nested_set') or die($DBI::errs +tr); my $tree=DBIx::Tree::NestedSet->new(dbh=>$dbh,db_type=>'SQLite'); $tree->create_default_table(); $tree->add_child_to_right(term=>'Body Regions'); $tree->add_child_to_right( term=>'Extremities', id=>$tree->get_root() ); my $extremities_id=$tree->get_id_by_key( key_name=>'term', key_value=>'Extremities' ); foreach('Arm','Elbow','Hand'){ $tree->add_child_to_right( term=>$_, id=>$extremities_id ); } my $hand_id=$tree->get_id_by_key( key_name=>'term', key_value=>'Hand' ); $tree->add_child_to_right( id=>$hand_id, term=>'Fingers' ); my $fingers_id=$tree->get_id_by_key( key_name=>'term', key_value=>'Fingers' ); $tree->add_child_to_right( id=>$fingers_id, term=>'Thumbs' ); my $structure=$tree->get_self_and_children_flat(); foreach(@$structure){ print " " x $_->{level}. $_->{term}."\n"; } $dbh->do('drop table nested_set');

    -Any sufficiently advanced technology is
    indistinguishable from doubletalk.

    My Biz

Re: Tree Traversal Script/Nested Sets
by bsb (Priest) on Jul 14, 2004 at 02:56 UTC
    You might want to convert you source data to YAML, it looks pretty close already. YAML is usually less clunky than XML.

    As for the database loading, it might be easier to use the database's load tool (I don't know about sqllite)

    3000 records is pretty small you should consider just using the YAML (depending on your querying needs).

Re: Tree Traversal Script/Nested Sets
by dragonchild (Archbishop) on Jul 14, 2004 at 12:40 UTC
    Yah - SQLite doesn't support foreign keys. Adding foreign keys, though, is very simple. You only need them for modification operations. So, just have all your database access go through four functions - select(), insert(), update(), and delete(). (If you even want to support delete. We don't, on one of our apps.) Then, every call to insert(), update(), or delete() will have a call to select() to make sure everything will remain OK.

    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested