Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

by ssc37 (Acolyte)
on Apr 07, 2014 at 09:52 UTC ( [id://1081382]=perlquestion: print w/replies, xml ) Need Help??

ssc37 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have a problem with concurrent insert into different SQLITE databases for each threads and don't understand how to solve it or if it is a limitation.

If i use only one thread, the process take 22 seconds (559512 for 1324 databases, ~36Mo)
If i use 10 threads, the process take 1 minutes 20 seconds ....
if i use 32 threads, the process take 3 minutes 43 seconds ....

With some debug it seems that the time is take when the pool is create.(~3 minutes for 32 threads)
What am i doing wrong?

The script is execute on a server with 128GO and 32 cores, no peaks for the cpu or the memory.

Thanks for your help
Best regards,
UPDATE: just forgot to mention that i work with memory tables, same problem with file:/dev/shm/xxx.sqlite or file:/dev/shm/xxx.sqlite?mode=memory
This is the part of the code who make the insert:
our $THREADS = 32; sub worker { my $tid = threads->tid; my( $Qwork, $Qresults ) = @_; while( my $item = $Qwork->dequeue ) { my $result; my ($M,$m) = @{$item}; my $dbfile_connect = qq{${dbfile}_${M}_${m}}; my $sqlite_dbh = sqlite_connect($dbfile_connect,''); create_table_dpt(\$sqlite_dbh,\$table_dpt_sqlite,\@{$data_dpt} +); create_tmp_table(\$sqlite_dbh,\$table_a_sqlite,\$table_d_sqlit +e,\%{$uniq_values}); insert_a_tmp_table(\$sqlite_dbh,\$table_a_sqlite,\$table_d_sql +ite,\@{$data_ref->{$M}->{$m}},\%{$data_g_c},\%{$data_c}); $sqlite_dbh->disconnect(); $Qresults->enqueue( \%{$result} ); print "$M $m finit \n"; } $Qresults->enqueue( undef ); ## Signal this thread is finished } my $Qwork = new Thread::Queue; my $Qresults = new Thread::Queue; my @pool = map{ threads->create( \&worker, $Qwork, $Qresults ) } 1 .. $THREADS; foreach my $M_m (sort { $uniq_values->{M_id_top}{$b} <=> $uniq_values- +>{M_id_top}{$a} } keys(%{$uniq_values->{M_id_top}})) { my ($M,$m) = split('_',$M_m); $Qwork->enqueue([$M,$m]); } $Qwork->enqueue( (undef) x $THREADS ); $_->join for @pool;
  • Comment on Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
  • Download Code

Replies are listed 'Best First'.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by erix (Prior) on Apr 07, 2014 at 10:10 UTC

    SQLite is not strong in concurrency; for practical (writing) purposes it is best regarded as a 1-person database.

    See the SQLite FAQ sit and when (not) to use.

      Hi,
      Indeed, but i don't do concurrent write on the same database.
      When the threads start it's really fast, my problem seems to be when the pool is create with
      my @pool = map{ threads->create( \&worker, $Qwork, $Qresults ) } 1 .. $THREADS;

      The money quote here might be "SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time."

Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by RichardK (Parson) on Apr 07, 2014 at 11:21 UTC

    You didn't say what OS you are running on, and as perl's thread implementations vary that may be important.

    perlthrtut has this to say which may be relevant to your problem :-

    Performance considerations The main thing to bear in mind when comparing Perl's ithreads to o +ther threading models is the fact that for each new thread created, a c +omplete copy of all the variables and data of the parent thread has to be +taken. Thus, thread creation can be quite expensive, both in terms of mem +ory usage and time spent in creation.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by BrowserUk (Patriarch) on Apr 07, 2014 at 11:01 UTC

    Sounds about right. Well done. You produced pretty much exactly the results I would expect from looking at the code you've posted.

    (Had that code be complete, I might have tried running it, but....)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow: BY YOUR DESIGN!)
by BrowserUk (Patriarch) on Apr 07, 2014 at 15:02 UTC

    I suspect that the problem with slowness has little or nothing to do with your use of sqllite, and almost everything to do with poor coding choices with respect to shared data and queues.

    I further suspect that -- subject to verifying my quesswork regarding the chunks of your code that you haven't posted -- that a few, relatively minor changes could effect a considerable speedup in the run time of this code.

    Once again, can I suggest that you fill in the blanks so we stand some chance of actually helping you. Assuming you actually want help?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Sure, sure,
      I just try to reduce the part of the code who i haven't provided and construct a sample dataset (but i have difficulties to construct it from scratch)

      I'm pretty sure you're right about the "poor coding choices" because i'm a beginner and i scratch my head since days to try to figure out where i've made mistakes.

      May i join some tabulate files for the data than i retrieve from mysql? it will help me a lot

      Os: linux debian Perl: v5.14.2 Best regards,
        May i join some tabulate files for the data than i retrieve from mysql? it will help me a lot

        Sorry, but I do not understand that question -- it seems to have lost a lot in translation. would you try re-phrasing it for us?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by CountZero (Bishop) on Apr 07, 2014 at 13:50 UTC
    Just one question: is 22 seconds for this operation too slow for any reason?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by sundialsvc4 (Abbot) on Apr 07, 2014 at 16:32 UTC

    It clearly appears to me that each thread, and without using transactions, is opening a database-file at random, doing two inserts into it, and then closing it again.   This, emphatically, is not the right way to do this with the SQLite system.   There are two key changes that you should make here:

    1. You should have one thread for each database that you wish to insert into, and you should queue the work that is destinated for a given database to a queue that corresponds to that thread and to that database.
    2. The work must be placed inside of a transaction, lest SQLite physically-verify every single disk write that occurs.

    But, first, step back and seriously reconsider your whole design.   You have very seriously over-engineered the whole thing.   Why not have a single process that takes a database-name as a command-line parameter, then greps through the input file considering only lines-of-interest, and inserting those lines into the one database that it is then working with, doing it all in reasonably-sized transactions.   Run this one process in parallel with other copies of itself, or one at a time.

    One bit of overhead that you want to eliminate, however you choose to do it, is the constant opening and closing of files.   The other piece of overhead that you must eliminate, through the use of transactions, is the physical verification of every disk-write that occurs, which is SQLite’s by-design default behavior.

      i have some treatments to make on a particular table (Mysql) with some millions of records.

      i have the need to treat the data by bloc defined by the composed index -> (v_ma_id , v_mo_id) in one shot.

      i have enough memory to have all the table in memory and don't know in advance which blocs will be present in the table (that's why i fetch all the data in one shot).

      Now with the full code for the exemple, you will see than i use one thread by bloc, and each thread create his own database and insert the data every 50000 records with a transaction

      I've join a sample of data to let you see which kind of data that i need to insert.

      The full sample has a size of 4mb (~87700 lines / 38 Blocs) and is retrieve from Mysql and insert in 14s with 1 thread. With 10 threads: 33seconds etc..

      Thanks for your help because this problem really make my crazy :/
      Any comments / advices about the "quality" of the code will be really appreciated to help me to produce more robust code.
      update: Full sample: http://www.speedyshare.com/APpVh/sample-data-monks.txt
      update2: remove a part of the code not necessary for this test
      With 330642 lines for 76 blocs the time is 55 seconds with 1 thread, 2m30 with 32 threads
Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by sundialsvc4 (Abbot) on Apr 08, 2014 at 02:33 UTC

    In my humble opinion, your problem is still that the various threads will be competing with one another.   Very likely, you do not need threads at all ... very likely, you do not need any of this complexity ... and this just might be what these results are trying to tell you!

    I suggest that you would be far better served, either by sticking with the one-process approach which (still) seems to be giving you fairly decent performance, or by modifying the (non-threaded) program so that it scans the input file for the records that belong in a particular database (identified, say, by a command-line parameter ...), and then, if you wish, running multiple instances of this program concurrently – say, by using the "&" feature of the Unix/Linux shell.

    In the most friendly way possible, I suggest that you’ve built-up all this concurrency-stuff on the expectation that it would save you time, when most-clearly it is not doing so.   And, if an approach is not panning-out, the time comes to just let it go.   Personally, in this case, I think that time has come.

Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)
by Anonymous Monk on Apr 07, 2014 at 21:04 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1081382]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-20 13:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found