Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

by sundialsvc4 (Abbot)
on Apr 07, 2014 at 16:32 UTC ( #1081421=note: print w/replies, xml ) Need Help??

in reply to Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

It clearly appears to me that each thread, and without using transactions, is opening a database-file at random, doing two inserts into it, and then closing it again.   This, emphatically, is not the right way to do this with the SQLite system.   There are two key changes that you should make here:

  1. You should have one thread for each database that you wish to insert into, and you should queue the work that is destinated for a given database to a queue that corresponds to that thread and to that database.
  2. The work must be placed inside of a transaction, lest SQLite physically-verify every single disk write that occurs.

But, first, step back and seriously reconsider your whole design.   You have very seriously over-engineered the whole thing.   Why not have a single process that takes a database-name as a command-line parameter, then greps through the input file considering only lines-of-interest, and inserting those lines into the one database that it is then working with, doing it all in reasonably-sized transactions.   Run this one process in parallel with other copies of itself, or one at a time.

One bit of overhead that you want to eliminate, however you choose to do it, is the constant opening and closing of files.   The other piece of overhead that you must eliminate, through the use of transactions, is the physical verification of every disk-write that occurs, which is SQLite’s by-design default behavior.

  • Comment on Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

Replies are listed 'Best First'.
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by ssc37 (Acolyte) on Apr 07, 2014 at 17:08 UTC

    i have some treatments to make on a particular table (Mysql) with some millions of records.

    i have the need to treat the data by bloc defined by the composed index -> (v_ma_id , v_mo_id) in one shot.

    i have enough memory to have all the table in memory and don't know in advance which blocs will be present in the table (that's why i fetch all the data in one shot).

    Now with the full code for the exemple, you will see than i use one thread by bloc, and each thread create his own database and insert the data every 50000 records with a transaction

    I've join a sample of data to let you see which kind of data that i need to insert.

    The full sample has a size of 4mb (~87700 lines / 38 Blocs) and is retrieve from Mysql and insert in 14s with 1 thread. With 10 threads: 33seconds etc..

    Thanks for your help because this problem really make my crazy :/
    Any comments / advices about the "quality" of the code will be really appreciated to help me to produce more robust code.
    update: Full sample:
    update2: remove a part of the code not necessary for this test
    With 330642 lines for 76 blocs the time is 55 seconds with 1 thread, 2m30 with 32 threads

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1081421]
[Corion]: LanX: Yes, that's the main problem - you have lots (and lots) of workarounds in various places and stages of the processing, and to clean that mess up requires action across the complete codebase. And it's almost impossible to do it piece-by-piece
Corion is in a similar situation.#

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (12)
As of 2017-01-16 14:10 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (150 votes). Check out past polls.