http://www.perlmonks.org?node_id=1081421


in reply to Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

It clearly appears to me that each thread, and without using transactions, is opening a database-file at random, doing two inserts into it, and then closing it again.   This, emphatically, is not the right way to do this with the SQLite system.   There are two key changes that you should make here:

  1. You should have one thread for each database that you wish to insert into, and you should queue the work that is destinated for a given database to a queue that corresponds to that thread and to that database.
  2. The work must be placed inside of a transaction, lest SQLite physically-verify every single disk write that occurs.

But, first, step back and seriously reconsider your whole design.   You have very seriously over-engineered the whole thing.   Why not have a single process that takes a database-name as a command-line parameter, then greps through the input file considering only lines-of-interest, and inserting those lines into the one database that it is then working with, doing it all in reasonably-sized transactions.   Run this one process in parallel with other copies of itself, or one at a time.

One bit of overhead that you want to eliminate, however you choose to do it, is the constant opening and closing of files.   The other piece of overhead that you must eliminate, through the use of transactions, is the physical verification of every disk-write that occurs, which is SQLite’s by-design default behavior.

Replies are listed 'Best First'.
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by ssc37 (Acolyte) on Apr 07, 2014 at 17:08 UTC

    i have some treatments to make on a particular table (Mysql) with some millions of records.

    i have the need to treat the data by bloc defined by the composed index -> (v_ma_id , v_mo_id) in one shot.

    i have enough memory to have all the table in memory and don't know in advance which blocs will be present in the table (that's why i fetch all the data in one shot).

    Now with the full code for the exemple, you will see than i use one thread by bloc, and each thread create his own database and insert the data every 50000 records with a transaction

    I've join a sample of data to let you see which kind of data that i need to insert.

    The full sample has a size of 4mb (~87700 lines / 38 Blocs) and is retrieve from Mysql and insert in 14s with 1 thread. With 10 threads: 33seconds etc..

    Thanks for your help because this problem really make my crazy :/
    Any comments / advices about the "quality" of the code will be really appreciated to help me to produce more robust code.
    update: Full sample: http://www.speedyshare.com/APpVh/sample-data-monks.txt
    update2: remove a part of the code not necessary for this test
    With 330642 lines for 76 blocs the time is 55 seconds with 1 thread, 2m30 with 32 threads