Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^6: threads::shared seems to kill performance (Workaround).

by Jacobs (Novice)
on Jul 24, 2013 at 18:55 UTC ( #1046185=note: print w/replies, xml ) Need Help??


in reply to Re^5: threads::shared seems to kill performance (Workaround).
in thread threads::shared seems to kill performance

So I finally got some time to play with this and I still can't wrap my head around it.

If I understand it correctly, the trick is in the fact that you:

  1. load the whole DB as a memdb with shared cache
  2. then use the
    shared_clone( $dbh )
    to get this memory where the DB is loaded shared with all the threads
  3. and then inside the thread create a new DB handler, but thanks to the fact you're using the same memdb with shared cache, it reuses the same memory as the DB handler in the main thread

Is that even remotely correct? :-) And if so:

  1. Why do the shared_clone( $dbh )? Isn't the shared memdb enough for the threads to have access to the same memory? I've been trying to somehow see the difference between the two scenarios (with and without passing the shared_clone( $dbh )) and I'm probably not measuring this right, but the memory usage (vm_stat) seems to be similar in both cases.
  2. What's the purpose of the
    $Q->nq( ('Wakey wakey') x $T ); ... $Q->dq; ## Wait for DB
    ? Why pass the string to the thread and then just dequeue it? And what's the relation to waiting for DB?

Replies are listed 'Best First'.
Re^7: threads::shared seems to kill performance (Workaround).
by BrowserUk (Pope) on Jul 24, 2013 at 19:46 UTC
    If I understand it correctly ... then use the shared_clone( $dbh ) ... Is that even remotely correct?

    No. Look again and you'll see that I don't use the share_clone( $dbh ) at all within the threads. I never actually access the second parameter I pass:

    my @threads = map threads->create( \&thread, $Q, shared_clone( $dbh ) +), 1 .. $T; #............................................^1, ^^^^^^^^^ 2 ^^^^^^^^

    within the thread procedures;

    sub thread { my $tid = threads->tid; my( $Q ) = @_;

    I'm afraid that this is a 'left-over' -- ie. I should have deleted it before posting; sorry -- from earlier experiments that lead to my discovering the need for the workaround.

    The reason we need a workaround is because sqlite :memory: dbs are only visible to the connection that creates them.

    But if you share a $dbi, the contents of the dbi (the C/XS bits that the sqlite libraries use) gets mangled internally by DBI such that those libraries see the shared connection from different threads as being different connections; thus the threads see different :memory: dbs.

    The workaround is to use the uri-form of the connect string + the modifiers mode=memory&cache=shared to cause sqlite to connect all the (new; unshared) connects made in each of the threads to the same db.

    Why do the shared_clone( $dbh )? Isn't the shared memdb enough for the threads to have access to the same memory?

    Not applicable :)

    What's the purpose of the $Q->nq( ('Wakey wakey') x $T ); ... $Q->dq; ## Wait for DB ?

    Why pass the string to the thread and then just dequeue it? And what's the relation to waiting for DB?

    Again, this is a left-over from earlier experiments. Its purpose was to block the threads whilst I created the db in the main code as originally I was creating the threads before I created the db. The idea was to prevent the memory used by the DB being cloned when the threads are created.

    However, as I'm no longer using a shared connection handle, that precaution is no longer relevant. So, the nqing of the strings and the associated dqing of them can be safely removed.

    Here's the latest for of the code with a few tweaks and some timing code added:

    Finally, I've still to resolve the undeleted file problem. The fact that a file is being created at all is somewhat disconcerting as it mean that I'm not getting the (purely) memory db I was hoping for; and could mean that performance isn't what it should be as a result. I did try contacting the mailing list about this, but got no useful response. I haven't had time/motivation to pursue it further.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Ah, perfect, now I get it! Thank you very much :-)

      That file thing is a bit strange indeed, I hope there's some solution to that.

        That file thing is a bit strange indeed, I hope there's some solution to that.

        The problem is finding someone with some expertise of the Windows port of sqlite.

        BTW: As your data is already in an sqllite DB, rather than copying it all from the into this mode=memory DB(*), why not just open connections to the existing db file from each of the threads using the uri-form connect string + the cache=shared modifier and see how the performance pans out.

        (*especially as it seems these involve IO anyway.)


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1046185]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2018-04-21 21:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?