|Problems? Is your data what you think it is?|
Architectural question...by devnul (Monk)
|on Jul 05, 2005 at 08:20 UTC||Need Help??|
devnul has asked for the
wisdom of the Perl Monks concerning the following question:
I've started to hit a wall with SQL and sort-of had this idea of moving a certain piece of functionality out of the SQL query and into something which operates more quickly. Speed is vital here, as I will explain.
First of all, what I am doing involves running a query against a data table and joining/grouping that result by what is stored in a category table. Thus giving me a count of all the items in the data table which fall into a given category.
Something like:While doing this sort of query works well for smaller tables, the data has grown dramatically over the past several months and I now have close to 100,000 rows in this table. On my fast production systems, performance is "adequate" (2 or 3 seconds). On my incredibly slow development system it is unacceptable (30+ seconds). What I am trying to achieve is "adequate" performance on my development system.
.. I just typed this, it is not real data.. Hopefully I got all the math right and things add up correctly.
One way I thought about doing this was to have a separate process/server running that when fed a list of IDs would look up each of them in its cache, requerying the database for any which have grown stale/etc.
The first order of business was to determine if this mountain can be moved. My test script is:
Most of this script is just being used to setup the data/cache for later use. The real work I am trying to measure begins with the "Returning mapped categories for values..." and ends at the "Completed in $total_time" message.
As you can probably tell from the code I am setting up 2812 categories and testing how quickly I can map 100,001 of them.
The last sleep is just so I can look at the process and guage how much memory it is using.
So with all this being said, I get a run time of about 2.1 seconds on my development system. Do you see any obvious ways that this section of the code could be improved, considering the importance of raw performance here?
My second question is... The recipient of what this server would output would actually be a PHP script. I wrote a PHP version of this same test script, but the performance was much worse (3.6 seconds). What might be a quick and efficient means to connect this up to PHP/Apache?
.... Of course, I am very open to any other possible solutions to this problem, this is just one I thought of tonight, but I'm sure there must be a better way...
Thank you for your help and consideration!