Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: multi threading DBI

by sundialsvc4 (Abbot)
on Oct 29, 2013 at 20:35 UTC ( #1060220=note: print w/replies, xml ) Need Help??

in reply to multi threading DBI

Ladies and gentlemen, please forgive my “brashness” in advancing what may at first well appear to be “an irrelevant, sideways, Sundialsvc4-typically-off-the-topic guess ... and ... well ... hear me out on this one.

SQLite is not slow. ... ... except ... ... except ... ... in one very counter-intuitive case that just happens to be tremendously well-known to me, because for a time it absolutely show-stopped an equally important project of mine that also used this database.   I was shocked at how hideous SQLite’s performance was!   Then, one day, as I was amusing myself with the “activity monitor” while waiting for my script to grind on ... I noticed something.   My script was doing a truly-extraordinary number of disk operations.   The entirety of what was bringing this script to its most-improbable knees was ... disk I/O.

It was only-then that I looked for, and found (somewhere ...) a discussion of what SQLite actually does and why it does it.   This now-forgotten page discussed how SQLite re-reads what is in memory and how it re-reads after it writes, and how “transactions” are a Magickal Cure for all of that.   I had never encountered a file-based application before that did that.   I tried it for myself and ... well ... “holy sh*t!”   All of the sudden, this thing took-off like the proverbial bat.

Hence ... my guess.   My guess that “too slow” is in fact due to a cause that I had not suspected, either, when I encountered it and was show-stopped by it for the first time.   My guess that “threading, or the lack/presence thereof,” most likely has nothing to do with it.   I therefore injected this admittedly out-of-band comment, to suggest a completely-different stone that should be turned-over first.

Before getting started with multi-threading this thing ... try this, first.   Wrap every activity that writes or that reads to this database in a transaction.   (If you are issuing a large number of statements, periodically commit.)   But, do not do anything meaningful outside of one.   Reserve judgment about “what to do with this program” until you have tried this first.   (And if you already have done this, just send me to bed once again without any supper.)

Replies are listed 'Best First'.
Re^2: multi threading DBI
by marinersk (Priest) on Oct 30, 2013 at 00:19 UTC
    Frankly, absent profiling tools, I think testing both solutions has merit.

    With profiling tools, the answer will likely become obviously with little to no core code change.

    And were I a betting man, I'd say both solutions are likely to produce improvements, with a smaller bet that multithreading would likely yield a greater gain in the scrapes/minute category. Improving the back end (DB updates/minute) can only be a good thing, so I am inclined to favor doing both in any regard.

    But the multi-threading scraper into a queue with a single-threaded DBI operation sounds like the more fun engineering exercise (and I admit this should be an irrelevant attribute in the decision making process but I had to throw it out there).

    Encapsulating DB updates in transactions is almost no development effort at all (and highlights that it should probably be done anyway).

    How's that for sitting on the fence?  :-)

Re^2: multi threading DBI
by Anonymous Monk on Oct 30, 2013 at 12:45 UTC
    Wrap every activity that writes or that reads to this database in a transaction.

    Reading from the database does not require disk-sync calls and wrapping them into a transaction won't help. Besides, transactions themselves don't help one bit unless you wrap multiple writes into a single transaction. (I guess you understand that but failed to say so.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1060220]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2018-05-27 18:40 GMT
Find Nodes?
    Voting Booth?