There's more than one way to do things | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Having tackled problems like this before, I don't think that there will be a solution as simple as "prepare your SQL statement this way and this will run 200x faster".
We have to look at the resources being used when running your code against the database: CPU, MEMORY, DISK, NETWORK. A good model to keep in your head is that NETWORK is 1000x slower than DISK which is 1000x slower than MEMORY which is 10x slower than CPU. Your performance will be dictated by the amount of each resource you use to solve this problem. If your solution involves:
In your situation, where is the database relative to your perl code? If the database and your code are running on the same CPU and the entire database and data file can be loaded into MEMORY, then everything will run fast. If your datafile and database are large (2M to 10M rows in your worst case scenario) then a lot of DISK activity may be involved to get the data up to your CPU. Having your data file and perl code on a CPU which is different than your database CPU (separated by a network) will be problematic if you need to transfer data many times (within loops).
Assuming that you have enough MEMORY where your perl code is running, you can load the list of database records (names) into a hash use it to check the lines of the file. This would be the best approach when the database is small (less than 10K rows) and the data file is large (greater than 1M rows).
If your database is large (greater than 1M rows) and your data file is REALLY small (10 rows), then your looping approach above will work fine. If your data file is slightly larger, but not too big (less than 50K rows), then you may want to simply load your data file into a temporary database table and use an SQL statement to join your records against the main lookup database. The reason this might be faster is that execution of SQL statements requires the database to parse the statement and then perhaps load statistics and indexes to find the data you want. You can save on SQL parsing time by doing perpare/bind/execute, but the database will still have to perform the actual lookup of indexes and data, which may require moving around a lot of MEMORY, or worst case loading a lot of blocks from DISK. When running 50K worth of SQL statements, the database engine has to perform complete lookup operations on each row. If you instead load 50k rows of data and ask the database (via one SQL statement) to match up the two name lists, the database engine can optimize how it performs lookups and streamline the entire process. All of this advice is fairly general, but which approach you use depends on which end of the spectrum your scenario falls into...and your starting critera (10K to 10M vs 1K to 1M variability) are very broad. In reply to Re: PSQL and many queries
by fzellinger
|
|