|Perl: the Markov chain saw|
Re: Need to Improve Performanceby roboticus (Canon)
|on Jan 03, 2014 at 12:40 UTC||Need Help??|
I'd suggest learning SQL a little better so you can make the database do the part it's good at. In particular, start thinking about how to use join and think about data retrieval as a set operation rather than a procedural operation. If you do so, you could execute a single SQL statement and process it something like this (untested):
You can get fancier and get the SQL server to give you counts, averages, max/min values, subtotals, etc. If you spend a bit of time learning SQL, you'll find a lot of these sorts of operations easier and faster.
Why is your code inefficient? Primarily because you're executing more queries than you need to, leading to:
Doing it as a single query gives you one round-trip to parse/execute the query, and since the query returns all your results, it can bundle multiple rows per network packet, greatly reducing the amount of network traffic. (That may be a benefit for your other systems, as well!)
Update: Just in case you want to see how much network traffic could potentially be saved, here are some fake numbers, loosely based in reality.
For this calculation, I'm assuming it takes one round-trip for executing the query, and one round-trip for fetching a single result. Since you're fetching back only a single value from the database each time, I'm going to further assume that each result set could hold values at a time.
So with these assumptions, it looks like you'll have 10 times the network traffic with your code. The dominant factor is the number of data values returned per network transmission, which I conservatively put at 10. I expect that it would be greater than that, with a corresponding increase in efficiency between the two methods.
Someone may want to check the calculations and/or method. For the "My I/Os" column, I have 1 round trip for the query, and (# elements per row)*(# dtl IDs)*(# PAN IDs)/(# values per packet). For the "Your I/Os" column, I come up with 2 round trips round trips when # Dtl IDs is 10, and 11 when # Dtl IDs is 100 and I add 2*(# Dtl IDs)*(# PAN IDs) to it (where 2 is 1 round trip for the query, and one for getting the result).
When your only tool is a hammer, all problems look like your thumb.