http://www.perlmonks.org?node_id=501935

shlomif has asked for the wisdom of the Perl Monks concerning the following question:

As part of my day job, I'm working on an application. In the application there are items, which have a canonical hierarchy, and a browsing hierarcy. I'm using DBD::SQLite2 with Class::DBI.

I'm using the DBIx::Tree::NestedSet module for the canonical hierarcy, specified in the items. What happens is that every item only has one parent. This uses the nested set tree mechanism (references for what it does are in the docs for it.

I'm using a different table (items_browse) for the browsing "hieararchy". This is a simple parent_id many-to-many child_id relationship. The browsing hierarchy may also contain circles.

I'm using the following indexes:
CREATE INDEX index_items_lft ON items ( lft ),
CREATE INDEX index_items_rgt ON items ( rgt ),
CREATE INDEX index_item_browse_parent_id ON item_browse ( parent_id , child_id ),
CREATE INDEX index_item_browse_child_id ON item_browse ( child_id ),

Now, I populated the items from a flat file, into both the canonical and browsing hieararchies. It has about 10,000 records. However, then the web-interface I built for it is incredibly slow. (100 queries take 42 seconds). What indexes do I need to use to improve its performance?

I'm not using DBD::SQLite (with SQLite 3) because its DBD driver has some bugs that I found and was unable to fix. MySQL seems to have some limitations on the query as well, and I'm leaning towards converting everything to PostgreSQL, but did not do it yet, and didn't get the approval of my boss.

  • Comment on Optimizing Tree Hierarchies with DBD::SQLite2

Replies are listed 'Best First'.
Re: Optimizing Tree Hierarchies with DBD::SQLite2
by Corion (Patriarch) on Oct 21, 2005 at 10:01 UTC

    Without seeing your queries, I'm not sure what slows you down. Bear in mind that SQLite will use at most one index for any given query - so you should look at the generated query plan for your queries to determine if the indices you gave are actually used.

    From how I interpret the documentation, SQLite will also benefit from an early reduction of the JOIN size, that is, you need to order the JOINs of tables yourself so that the initial resultset is as small as possible.

    Of course, you can likely pre-cache the browsing hierarchy and optimize it for fast reads (and slow writes) by employing some triggers on the main table and the normalized browsing relationship table, and create the flattened tree with the triggers. SQLite supports triggers, but MySQL does not, so if you're planning on porting this to other systems, beware.

    What problems did you find with SQLite3 that SQLite2 doesn't have ?

      <<<
      Bear in mind that SQLite will use at most one index for any given query - so you should look at the generated query plan for your queries to determine if the indices you gave are actually used.
      >>>

      Hmmm... that could be a big problem.

      My queries are the following: from the ID fetch the children in the canonical categorization and the browsing categorization. I'm doing it using the primitives given by DBD::SQLite2 and Class::DBI. From the IDs of the different categories I also refer to their names.

      <<<

      What problems did you find with SQLite3 that SQLite2 doesn't have ?
      >>>

      Check this bug report.

        I once wrote my on categorization thing in a normalized fashion and I found that once I moved queries away from Class::DBI and into SQL JOINs, the results became much faster. This of course meant adding code like the following to the classes:

        __PACKAGE__->set_sql(related_items => <<SQL) SELECT __PRIMARY__ from __TABLE__ INNER JOIN relations ON child = __TABLE__.__PRIMARY__ WHERE relations.parent = ? SQL sub related_items { my ($self) = @_; return $self->search_related_items($self->id); };

        and much uglier JOINs if you are constructing nested sets or junctive queries. I have, for the time being, moved to a denormalized database and found writing the queries much nicer. Performance is acceptable, that is, interactive, for the single user of the system (me).

        If you'll be going the route of JOINing the relation tables, consider writing an extension to SQL::Abstract - I wrote me an (yet unreleased) extension to handle GROUP BY and HAVING clauses, and it worked well for me.

Re: Optimizing Tree Hierarchies with DBD::SQLite2
by Moron (Curate) on Oct 21, 2005 at 13:42 UTC
    I am confused by the expression 'simple parent_id many-to-many child_id', because parent/child is a one-to-many relationship and many-to-many is a peer-to-peer relationship, normally implemented as parent-to-parent via a mutual child table known as a link table..

    -M

    Free your mind