Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

ssc37 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by erix (Prior) on Apr 07, 2014 at 10:10 UTC
SQLite is not strong in concurrency; for practical (writing) purposes it is best regarded as a 1-person database. See the SQLite FAQ sit and when (not) to use.	[reply]
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by ssc37 (Acolyte) on Apr 07, 2014 at 10:18 UTC
Hi, Indeed, but i don't do concurrent write on the same database. When the threads start it's really fast, my problem seems to be when the pool is create with `my @pool = map{ threads->create( \&worker, $Qwork, $Qresults ) } 1 .. $THREADS;` [download]	[reply] [d/l]
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by GotToBTru (Prior) on Apr 07, 2014 at 14:22 UTC
The money quote here might be "SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time."	[reply]
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by RichardK (Parson) on Apr 07, 2014 at 11:21 UTC
You didn't say what OS you are running on, and as perl's thread implementations vary that may be important. perlthrtut has this to say which may be relevant to your problem :- `Performance considerations The main thing to bear in mind when comparing Perl's ithreads to o +ther threading models is the fact that for each new thread created, a c +omplete copy of all the variables and data of the parent thread has to be +taken. Thus, thread creation can be quite expensive, both in terms of mem +ory usage and time spent in creation.` [download]	[reply] [d/l]
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by BrowserUk (Patriarch) on Apr 07, 2014 at 11:01 UTC
Sounds about right. Well done. You produced pretty much exactly the results I would expect from looking at the code you've posted. (Had that code be complete, I might have tried running it, but....) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!) by BrowserUk (Patriarch) on Apr 07, 2014 at 15:02 UTC
I suspect that the problem with slowness has little or nothing to do with your use of sqllite, and almost everything to do with poor coding choices with respect to shared data and queues. I further suspect that -- subject to verifying my quesswork regarding the chunks of your code that you haven't posted -- that a few, relatively minor changes could effect a considerable speedup in the run time of this code. Once again, can I suggest that you fill in the blanks so we stand some chance of actually helping you. Assuming you actually want help? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!) by ssc37 (Acolyte) on Apr 07, 2014 at 15:17 UTC
Sure, sure, I just try to reduce the part of the code who i haven't provided and construct a sample dataset (but i have difficulties to construct it from scratch) I'm pretty sure you're right about the "poor coding choices" because i'm a beginner and i scratch my head since days to try to figure out where i've made mistakes. May i join some tabulate files for the data than i retrieve from mysql? it will help me a lot Os: linux debian Perl: v5.14.2 Best regards,	[reply]
Re^3: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!) by BrowserUk (Patriarch) on Apr 07, 2014 at 15:20 UTC
May i join some tabulate files for the data than i retrieve from mysql? it will help me a lot Sorry, but I do not understand that question -- it seems to have lost a lot in translation. would you try re-phrasing it for us? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!) by ssc37 (Acolyte) on Apr 07, 2014 at 15:38 UTC
Re^5: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!) by BrowserUk (Patriarch) on Apr 07, 2014 at 15:59 UTC
Some notes below your chosen depth have not been shown here
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by CountZero (Bishop) on Apr 07, 2014 at 13:50 UTC
Just one question: is 22 seconds for this operation too slow for any reason? CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by sundialsvc4 (Abbot) on Apr 07, 2014 at 16:32 UTC
It clearly appears to me that each thread, and without using transactions, is opening a database-file at random, doing two inserts into it, and then closing it again. This, emphatically, is not the right way to do this with the SQLite system. There are two key changes that you should make here: You should have one thread for each database that you wish to insert into, and you should queue the work that is destinated for a given database to a queue that corresponds to that thread and to that database. The work must be placed inside of a transaction, lest SQLite physically-verify every single disk write that occurs. But, first, step back and seriously reconsider your whole design. You have very seriously over-engineered the whole thing. Why not have a single process that takes a database-name as a command-line parameter, then greps through the input file considering only lines-of-interest, and inserting those lines into the one database that it is then working with, doing it all in reasonably-sized transactions. Run this one process in parallel with other copies of itself, or one at a time. One bit of overhead that you want to eliminate, however you choose to do it, is the constant opening and closing of files. The other piece of overhead that you must eliminate, through the use of transactions, is the physical verification of every disk-write that occurs, which is SQLite’s by-design default behavior.
Re^2: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by ssc37 (Acolyte) on Apr 07, 2014 at 17:08 UTC
i have some treatments to make on a particular table (Mysql) with some millions of records. i have the need to treat the data by bloc defined by the composed index -> (v_ma_id , v_mo_id) in one shot. i have enough memory to have all the table in memory and don't know in advance which blocs will be present in the table (that's why i fetch all the data in one shot). Now with the full code for the exemple, you will see than i use one thread by bloc, and each thread create his own database and insert the data every 50000 records with a transaction I've join a sample of data to let you see which kind of data that i need to insert. The full sample has a size of 4mb (~87700 lines / 38 Blocs) and is retrieve from Mysql and insert in 14s with 1 thread. With 10 threads: 33seconds etc.. Thanks for your help because this problem really make my crazy :/ Any comments / advices about the "quality" of the code will be really appreciated to help me to produce more robust code. update: Full sample: http://www.speedyshare.com/APpVh/sample-data-monks.txt update2: remove a part of the code not necessary for this test With 330642 lines for 76 blocs the time is 55 seconds with 1 thread, 2m30 with 32 threads Read more... (16 kB)	[reply] [d/l] [select]
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by sundialsvc4 (Abbot) on Apr 08, 2014 at 02:33 UTC
In my humble opinion, your problem is still that the various threads will be competing with one another. Very likely, you do not need threads at all ... very likely, you do not need any of this complexity ... and this just might be what these results are trying to tell you! I suggest that you would be far better served, either by sticking with the one-process approach which (still) seems to be giving you fairly decent performance, or by modifying the (non-threaded) program so that it scans the input file for the records that belong in a particular database (identified, say, by a command-line parameter ...), and then, if you wish, running multiple instances of this program concurrently � say, by using the "&" feature of the Unix/Linux shell. In the most friendly way possible, I suggest that you’ve built-up all this concurrency-stuff on the expectation that it would save you time, when most-clearly it is not doing so. And, if an approach is not panning-out, the time comes to just let it go. Personally, in this case, I think that time has come.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow) by Anonymous Monk on Apr 07, 2014 at 21:04 UTC
FWIW, its possible to get the benefits of concurrency by switching the sqlite library you link to with Oracle Berkeley DB SQL APIs .... BDBSQLite	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


Don't ask to ask, just ask
	PerlMonks