in reply to Architectural question...

I think you're comparing apples and oranges. The SQL program reads data from disk and groups that, the perl programs groups data that is already in memory. When you put the perl program in production it should also read the data from the database.

Do I understand correctly that all that you do is something like:

select a,b,c,d,count(*) from x group by a,b,c,d
Or do you also sum subgroups?

Can you please show the SQL query?

Raafschild

Replies are listed 'Best First'.
Re^2: Architectural question...
by devnul (Monk) on Jul 05, 2005 at 18:02 UTC
    Yes... The problem is my orange is not performing well, so I'm switching to an apple. The question is: which apple should I use?

    The code, in this case, does (or rather will, if I go forward with it) indeed read from the database as that is how it will populate the data structure I am using. The point is, this "server" would be persistent. So each request would only need to execute the code I am timing (and to fetch any data not yet in its cache or which is "dirty".

    I wanted to avoid getting into the SQL specifics, because I use some Postgres-specific stuff which is not immediately apparent. My query is something like:
    SELECT count(*) as count,category.category,nlevel(category.category) A +S level, subpath(category.category,0,nlevel(category.category)-1) as parent, category.head_title, category.cat_title, category.subcat_title FROM da +ta, category WHERE data.category <@ category.category GROUP BY category.category, category.head_title, category.cat_title, c +ategory.subcat_title
    The category table looks like:
    Table "category" Column | Type --------------+------------------- head_title | character varying cat_title | character varying subcat_title | character varying category | ltree
    The data table looks like:
    Table "data" Column | Type + +---------------------------------------------- id | integer category | ltree[]
    ... there are other columns here but these are the ones relevant to the query.

    - dEvNuL
      This certainly is a fruit of a different kind ;-).

      You might want to store the results of the query in a summary table and have that table maintained by a set of triggers on the data table. That way the persistence and the caching is handled by the database, so you don't have to write it yourself.

      Queries on this summary table should by quick, and all changes in the data table are immediately available in the summary table.

      Sorry for a not very perlish suggestion in this forum.

      Raafschild

        Its starting to drive me bananas... (sorry, I could not resist)

        The problem with storing it using triggers/materialized views is that the "where" clause against the data table is not constant. Sometimes I am searching the data table for records in a state, zip code radius, full-text search (using tsearch2), and about 3 other factors. So the "grouping" has to happen *AFTER* the result set is limited by the WHERE clause against the data table.

        .. I hope I explained that well enough...

        - dEvNuL