Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Having tackled problems like this before, I don't think that there will be a solution as simple as "prepare your SQL statement this way and this will run 200x faster".

We have to look at the resources being used when running your code against the database: CPU, MEMORY, DISK, NETWORK. A good model to keep in your head is that NETWORK is 1000x slower than DISK which is 1000x slower than MEMORY which is 10x slower than CPU. Your performance will be dictated by the amount of each resource you use to solve this problem. If your solution involves:
1. ...crunching a bunch of data in the registers of the CPU only, it will be very fast.
2. ...moving blocks of MEMORY around and doing some CPU crunching, it will be fast.
3. ...loading lots of data from the DISK to MEMORY repeatedly to be crunched by the CPU, it won't be fast.
4. ...moving large blocks of data across the NETWORK so that it can be loaded to MEMORY (and perhaps temporarily stored on DISK) before CPU crunching, it will really suck.

In your situation, where is the database relative to your perl code? If the database and your code are running on the same CPU and the entire database and data file can be loaded into MEMORY, then everything will run fast. If your datafile and database are large (2M to 10M rows in your worst case scenario) then a lot of DISK activity may be involved to get the data up to your CPU. Having your data file and perl code on a CPU which is different than your database CPU (separated by a network) will be problematic if you need to transfer data many times (within loops).

Assuming that you have enough MEMORY where your perl code is running, you can load the list of database records (names) into a hash use it to check the lines of the file. This would be the best approach when the database is small (less than 10K rows) and the data file is large (greater than 1M rows).

If your database is large (greater than 1M rows) and your data file is REALLY small (10 rows), then your looping approach above will work fine. If your data file is slightly larger, but not too big (less than 50K rows), then you may want to simply load your data file into a temporary database table and use an SQL statement to join your records against the main lookup database. The reason this might be faster is that execution of SQL statements requires the database to parse the statement and then perhaps load statistics and indexes to find the data you want. You can save on SQL parsing time by doing perpare/bind/execute, but the database will still have to perform the actual lookup of indexes and data, which may require moving around a lot of MEMORY, or worst case loading a lot of blocks from DISK. When running 50K worth of SQL statements, the database engine has to perform complete lookup operations on each row. If you instead load 50k rows of data and ask the database (via one SQL statement) to match up the two name lists, the database engine can optimize how it performs lookups and streamline the entire process.

All of this advice is fairly general, but which approach you use depends on which end of the spectrum your scenario falls into...and your starting critera (10K to 10M vs 1K to 1M variability) are very broad.

In reply to Re: PSQL and many queries by fzellinger
in thread PSQL and many queries by citycrew

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (7)
    As of 2018-06-20 15:59 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (116 votes). Check out past polls.