comment on

Having tackled problems like this before, I don't think that there will be a solution as simple as "prepare your SQL statement this way and this will run 200x faster".

We have to look at the resources being used when running your code against the database: CPU, MEMORY, DISK, NETWORK. A good model to keep in your head is that NETWORK is 1000x slower than DISK which is 1000x slower than MEMORY which is 10x slower than CPU. Your performance will be dictated by the amount of each resource you use to solve this problem. If your solution involves:
1. ...crunching a bunch of data in the registers of the CPU only, it will be very fast.
2. ...moving blocks of MEMORY around and doing some CPU crunching, it will be fast.
3. ...loading lots of data from the DISK to MEMORY repeatedly to be crunched by the CPU, it won't be fast.
4. ...moving large blocks of data across the NETWORK so that it can be loaded to MEMORY (and perhaps temporarily stored on DISK) before CPU crunching, it will really suck.

In your situation, where is the database relative to your perl code? If the database and your code are running on the same CPU and the entire database and data file can be loaded into MEMORY, then everything will run fast. If your datafile and database are large (2M to 10M rows in your worst case scenario) then a lot of DISK activity may be involved to get the data up to your CPU. Having your data file and perl code on a CPU which is different than your database CPU (separated by a network) will be problematic if you need to transfer data many times (within loops).

Assuming that you have enough MEMORY where your perl code is running, you can load the list of database records (names) into a hash use it to check the lines of the file. This would be the best approach when the database is small (less than 10K rows) and the data file is large (greater than 1M rows).

If your database is large (greater than 1M rows) and your data file is REALLY small (10 rows), then your looping approach above will work fine. If your data file is slightly larger, but not too big (less than 50K rows), then you may want to simply load your data file into a temporary database table and use an SQL statement to join your records against the main lookup database. The reason this might be faster is that execution of SQL statements requires the database to parse the statement and then perhaps load statistics and indexes to find the data you want. You can save on SQL parsing time by doing perpare/bind/execute, but the database will still have to perform the actual lookup of indexes and data, which may require moving around a lot of MEMORY, or worst case loading a lot of blocks from DISK. When running 50K worth of SQL statements, the database engine has to perform complete lookup operations on each row. If you instead load 50k rows of data and ask the database (via one SQL statement) to match up the two name lists, the database engine can optimize how it performs lookups and streamline the entire process.

All of this advice is fairly general, but which approach you use depends on which end of the spectrum your scenario falls into...and your starting critera (10K to 10M vs 1K to 1M variability) are very broad.

In reply to Re: PSQL and many queries by fzellinger
in thread PSQL and many queries by citycrew

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks