Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

by sundialsvc4 (Monsignor)
on Apr 08, 2014 at 22:59 UTC ( #1081581=note: print w/ replies, xml ) Need Help??


in reply to Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

I would like to postscript this thread with my opinion as to exactly why you are experiencing such unexpectedly poor performance as you increase the number of threads:   because you are sending requests to different threads concerning the same database file.

Generally speaking, SQLite really isn’t designed for multiple write access to the same database file.   While it can be done, subject to the numerous caveats that are set forth in the documentation on the SQLite web-site, it is not (IMHO ...) “the designer’s intent.”   So, you can very easily wind up with a situation where, perhaps, one process is blocked waiting for its turn to access the file.   This is certainly not what you want, nor intended.

What you need to do, then, is to have one writer-process per database.   Each process is therefore able to proceed at full-speed because it will never be contending for a shared resource (the database ...) with any of its colleagues.   Then, I think, you want to find the simplest way to accomplish this.   Which, in my opinion, is to dispense with all of the threading and queueing and such ... and simply run multiple instances of a simple Perl script in parallel with itself from the shell.   Let this script take two command-line parameters:   the name of a file to be processed, and the name of the database or the key within the input-file that this script-instance is supposed to look for.   Each instance opens “its” database, greps through the file looking for “its” keys-of-interest, and does its job.

You could carry this idea one step further on a Unix/Linux system by using xargs with the -p numprocs parameter.   Build a file with the names of all the databases in it.   Then, use xargs to run through that file, feeding those names as command-line parameters to numprocs children.   In this way, you are now able to accomplish your goal of processing multiple streams in-parallel, but without writing any code to do that, over and above the one script, each instance of which never concerns itself with anyone else.

I have had great “traction” from this approach in an awful lot of applications.


Comment on Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1081581]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2014-08-02 03:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (54 votes), past polls