Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Sharing XS object?

by menth0l (Monk)
on Mar 09, 2011 at 13:15 UTC ( #892186=perlquestion: print w/ replies, xml ) Need Help??
menth0l has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I wrote a tree implementation in XS which i fill with a lot of (key, value) pair data. Now i'd like to share a single instance of it across many threads (due to memory consumption and tree filling time) to use it like that:
use Tree::BK_XS; use threads; use threads::shared; my $o : shared = new Tree::BK_XS; # filling tree $o->insert('foo', 1); # now new thread sees filled tree threads->create(sub{ print $o->search('foo') });
How can i do it?

Comment on Sharing XS object?
Download Code
Re: Sharing XS object?
by Anonymous Monk on Mar 09, 2011 at 13:24 UTC
      I don't know if i understand this correctly:
      # this produces error: "invalid value for shared scalar" my $o :shared = new Tree::BK_XS; # and this causes perl to crash my $o :shared = shared_clone(new Tree::BK_XS);
      I always thought that that link that you've posted refers to hash-based, pure-perl objects?
        # this produces error: "invalid value for shared scalar" my $o :shared = new Tree::BK_XS;

        Which versions of perl/threads/threads::shared are you using?

        Show us the new() method of your module.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Sharing XS object?
by vkon (Deacon) on Mar 09, 2011 at 15:14 UTC
    Now i'd like to share a single instance of it across many threads (due to memory consumption and tree filling time)

    You will not gain memory benefits by using threads: they are inefficient in Perl, and, according to 'threads' documentation, any ':shared' scalar will be not only duplicated in both threads, but it will be 1) tied and 2) have additional booking cost.

    ++ for trying the hard area though.
    :)

Re: Sharing XS object?
by ELISHEVA (Prior) on Mar 09, 2011 at 21:32 UTC

    I'm by no means a thread expert, but it is my understanding that any data shared by means of threads::shared simply gives each thread a deep copy of the array, hash, or object. The sharing is by way of automated copying between threads rather than all threads sharing some common address space. If memory consumption is a concern, this is probably not what you want.

    You have a few options, though none of them simple.

    C-library managed object. One option would be to have your XS module manage the memory for the object. When your Perl program queried the library for the object, it would get back a pointer to an object held in a memory space managed by your library (i.e. your .so/.dll ). As a pointer is merely a number, the only Perl data structure involved is a simple scalar storing the memory address. Even if threads::shared is copying data, it will only be copying a small scalar not a whole huge object. Note: I'm told by Fletch in the CB that mod_perl manages some of its data that way - so you could look there for examples.

    The downside of this approach is that you will have to manually track reference counts and explicitly notify the C library when it has no more need for the object.

    If that happens naturally when all the threads die, it should be easy to do. If you need something more fine grained like knowing when a thread is done using the object somewhere in the middle of its run life, you'll essentially be writing a homegrown memory manager. If there is a perl module on C-PAN that does what you need, it is well worth spending the time looking for it. If not, either abandon this approach or expect to spend a lot of time testing and debugging this. If you aren't already familiar with all the tools for diagnosing memory leaks, you'll also need to budget time to learn those so you can test properly.

    Server-client threads. A second possibility is to use one thread to manage and store data and have other threads access the data through acccessor methods. It might look something like the code below. Please note though: even though the code is working code, it probably needs to be cleaned up a bit. I've taking care of the worst of the deadlock situations, but I'm sure I've missed a few.

    Communication between the client and server threads is handled using Thread::Queue objects. As might be expected, only plain scalars (not references) can be placed in the queue.

    To make it possible to pass more complex data, the server converts any data it returns to the client into string form using YAML. The client converts it back to an actual object, reference or scalar. I've only done this with return values. However, in a real implementation, you would likely need to do this conversion for all parameters as well.

    If all this feels like a lot of work and extra processing just to save memory, the simple answer is: it is. Optimizations for the sake of memory conservation almost always increase CPU consumption and vice versa.

    Without further ado, the code:

      If all this feels like a lot of work and extra processing just to save memory, the simple answer is: it is. Optimizations for the sake of memory conservation almost always increase CPU consumption and vice versa.

      That's all very well, but who is going to be happy with the trade-off of your code running 2 orders of magnitude slower than the code you are replacing?

      • A 1 second web request takes 100 seconds.
      • A 10 second utility run takes 15 minutes.
      • A 4-hour, cpu-bound, data-intensive batch process takes over two weeks.

      Especially as for an application using 10 threads and sharing 1000 data items, your code requires 64MB compared with threads::shared's 21MB.

      Optimisation?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        I'm not at all clear about the source of your numbers. Did you benchmark code? Do a big-O analysis? I find it hard to believe that my code even unoptimized has a 4-fold increase in total memory consumption over shared variables when only one thread has a copy of $oData and all the other threads request individual bits of data on an as-needed basis.

        The OP did not specify the usage pattern of data in his application, or at least I did not read the post that way. There is nothing there saying that he has very large number of individual items that have to be simultaneously shared between M threads.

        Based on his concern about a tree with an unspecified large number of nodes and a sample client appearing to do a search for a particular node named "foo", I made the assumption that he had in mind a quite different scenario. He has a very large data structure, perhaps 1G of data (before Perl overhead). He has threads that need to select bits and pieces from that data structure, e.g. query for a particular node in his tree. At any one time, in any one scope, each thread maybe needs no more than a handful of items out of that huge data structure, lets say 10. Assuming that those 10 items consume 100bytes each, we are talking about no more than a KB of data required by each client thread. Even without optimizations, I can't possibly see how deep copying 1G of data to each thread (10G total) would be better than 1G held by a server thread and 1K held by 10 client threads (1G+10K total). Even if you argued that all that marshalling meant 4x the amount of memory per data item, you still would only have 1G+40K total. That isn't anywhere near 10G, let alone 40G. What am I missing?

        Usually, if you actually did benchmarking, you post your results in some detail. Here you did not. Or did you mean me to read your numbers in a rhetorical light - if code is 100-fold slower, if code has 4x the memory.... It is unclear to me.

        If your actual point was "Don't be so cavalier about memory-processing time trade-offs because some just aren't worth it.", I agree entirely. It is totally silly to take two weeks to do something, when memory constraints could be solved by buying a few more GB of RAM at 10-120$ a GB depending on quality. However, in many applications, even a 100-fold increase in per-op time is of minimal concern if that op is only a small part of the larger code. Neither of us know what percentage of time the OP is spending querying his tree object relative to other processing he does with whatever data he retrieves.

        My point about the marshalling was not to say that you should tolerate it because that is the price you pay. Rather I meant just the opposite: be really sure memory is a real problem because the software solutions to memory constraints are going to cost you.

        Update: fixed some typos in numbers.

        Update: removed first paragraph - rewrite my post and BrowserUK's and realized he wasn't complaining that my code failed to optimize itself but rather that the whole idea of marshalling was not a tradeoff of memory consumption at the price of CPU.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://892186]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-09-21 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (168 votes), past polls