Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

by sundialsvc4 (Abbot)
on Apr 08, 2014 at 22:59 UTC ( #1081581=note: print w/replies, xml ) Need Help??

in reply to Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

I would like to postscript this thread with my opinion as to exactly why you are experiencing such unexpectedly poor performance as you increase the number of threads:   because you are sending requests to different threads concerning the same database file.

Generally speaking, SQLite really isn’t designed for multiple write access to the same database file.   While it can be done, subject to the numerous caveats that are set forth in the documentation on the SQLite web-site, it is not (IMHO ...) “the designer’s intent.”   So, you can very easily wind up with a situation where, perhaps, one process is blocked waiting for its turn to access the file.   This is certainly not what you want, nor intended.

What you need to do, then, is to have one writer-process per database.   Each process is therefore able to proceed at full-speed because it will never be contending for a shared resource (the database ...) with any of its colleagues.   Then, I think, you want to find the simplest way to accomplish this.   Which, in my opinion, is to dispense with all of the threading and queueing and such ... and simply run multiple instances of a simple Perl script in parallel with itself from the shell.   Let this script take two command-line parameters:   the name of a file to be processed, and the name of the database or the key within the input-file that this script-instance is supposed to look for.   Each instance opens “its” database, greps through the file looking for “its” keys-of-interest, and does its job.

You could carry this idea one step further on a Unix/Linux system by using xargs with the -p numprocs parameter.   Build a file with the names of all the databases in it.   Then, use xargs to run through that file, feeding those names as command-line parameters to numprocs children.   In this way, you are now able to accomplish your goal of processing multiple streams in-parallel, but without writing any code to do that, over and above the one script, each instance of which never concerns itself with anyone else.

I have had great “traction” from this approach in an awful lot of applications.

  • Comment on Re: Sqlite: Threads and inserts into a different database for each one. (1 thread fast, >1 slow)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1081581]
[thezip]: Our cat is single-handedly (pawed-ly?) solving our field mouse problem. About four critters in the last week alone.
[thezip]: Amazing speed and stealth in that one.
[GotToBTru]: You can't tunafish, but some seafood companies can tunafish. More fun with English.
[virtualsue]: a veterinarian told me to cut down on my cat's food. i said, when i do that she just catches more mice
[thezip]: My dogs are also excellent mousers, at least that's what they tell me.
[GotToBTru]: our beagle has dispatched 3 rabbits to the Great Hutch in the Sky since March.
[thezip]: I can always tell when the dogs have been mousing, as there is a concentric ring of mud around the end of their snouts when they do.
[thezip]: As for more physical evidence, that always seems to be missing somehow
[Rabbi Bob]: We have a Jack Russell: off the leash she is a torpedo against mice, rats, chipmunks and squirrels

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (15)
As of 2017-11-17 19:55 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (272 votes). Check out past polls.