Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
"be consistent"
 
PerlMonks  

Threads and DBD::SQLite?

by BerntB (Deacon)
on Dec 15, 2013 at 03:45 UTC ( #1067204=perlquestion: print w/ replies, xml ) Need Help??
BerntB has asked for the wisdom of the Perl Monks concerning the following question:

Edit: Could just using (very large) MEMORY tables in MySQL solve this trivially? :-)

There is an unpleasant old hack at work, which I'd like to replace with a nice Perl server.

The basic need is very quick read access to a bunch of GB of data over sockets (i.e. no disk). The data is too big to duplicate in multiple instances. I am thinking of in-memory SQLite tables (read access only), initialized at startup. SQLite supports opening and sharing the same in-memory table ("shared cache"), but only for the same process.

The problem is the "in the same process", since that implies threads! That is scary, as is using DBD features not commented on in the DBD::SQLite documentation.

That is the background. There are three problems which would take me time to find out and I hope people here can answer directly:

  1. How can I turn on the SQLite "Shared-Cache Mode" from DBI or DBD? (The SQLite in the DBD::SQLite on CPAN is new enough to support this.)
  2. Does it work as expected, if I fork off a thread, which opens SQLite shared memory? (Have anyone done this craziness with Perl before?)
  3. What is the recommended tcp/ip server module on CPAN for threaded programs? Net::Server doesn't have a threading personality afaik?

Notes added afterwards: Yes, I do need multi process/thread support, a previous query might take too much time to wait for (and there will be processor cores around). The process startup will be slow, since there is a lot of data. The data is already in a serious DB, queries there are too slow. The present query format would be easy to translate to (SQLite's) SQL. To run the queries on tie:d data stored in shared memory would need unpacking of all the data to check success; just too slow.

I do this as a hobby instead of work partly so I don't have to try to motivate it, partly because I don't know if this will work and partly so I can put it on CPAN (or just as a code example on Perlmonks, if it is short enough).

Comment on Threads and DBD::SQLite?
Re: Threads and DBD::SQLite?
by BrowserUk (Pope) on Dec 15, 2013 at 09:32 UTC

    1. How can I turn on the SQLite "Shared-Cache Mode" from DBI or DBD?

      Use a connect string of the form 'dbi:SQLite:dbname=file:memdb2?mode=memory&cache=shared'.

    2. Does it work as expected, if I fork off a thread, which opens SQLite shared memory? (Have anyone done this craziness with Perl before?)

      What are you expecting?

      (Kinda, but there are some issues. At least on windows.)

      For a start, retrieving a given number of records from a single thread&connection is nearly 2 orders of magnitude quicker than retrieving those same records via a connection made via a single connection in a separate thread.

      Running multiple clients in 2 separate threads cuts the time almost in half; and with 4 almost by 4; but it is still much faster to use a single thread/connection.

      This is true despite having deployed every speed-up pragma and option I can find.

    3. What is the recommended tcp/ip server module on CPAN for threaded programs?

      Personally, I don't use a module for tcp servers.

      Writing a threaded server that does what I need is pretty simple. Making someone else's implementation (if you can find one) do what I need it to do is always harder.

    A few questions: What OS are you targeting? What does your data (schema) look like? What do your queries look like?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Ah, thank you! The exact string "cache=shared" gave some Google hits. This was fixed for DBD::SQLite this summer, it seems.

      For a start, retrieving a given number of records from a single thread&connection is nearly 2 orders of magnitude quicker than retrieving those same records via a connection made via a single connection in a separate thread.

      Two orders of magnitude slower using threads?! Sigh. Does it help to set "read uncommitted" on? (You do that in the connect string too?)

      A few questions: What OS are you targeting? What does your data (schema) look like? What do your queries look like?

      Honking big, multiprocessor Linux servers with Oracle and more memory than you can shake a stick at. :-) I'd like to replace an old C hack using a few GBytes of RAM for querying some subsets of a few tables really quick (quite a lot of work is required for some queries). The queries are data structures easily converted to SQL.

        Does it help to set "read uncommitted" on? (You do that in the connect string too?)

        Not in my experiments on windows. (No, it is a pragma. See below.)

        I do not have any great faith that my tests on windows will be reflected on *nix; so here is my test harness:

        I'd love to see the results from a few runs done' on other OSs.

        You can get threads::Q (my own, efficient, self-limiting queue implementation) from here.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Threads and DBD::SQLite?
by erix (Priest) on Dec 15, 2013 at 10:59 UTC
    The data is already in a serious DB, queries there are too slow

    If a $SeriousDB -> SQLite migration succeeds could you let us know how you did it?

    (BTW what system is $SeriousDB? It rather sounds like a config problem)

      The data is already exported to an external program, which I want to replace. The exported stuff is done by a specific coding tool, selecting a subset of columns. (Nothing I can share and nothing anyone else would have any use for. :-) )

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067204]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2014-04-21 05:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (491 votes), past polls