Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Keep It Simple, Stupid
 
PerlMonks  

Re: Sharing XS object?

by ELISHEVA (Prior)
on Mar 09, 2011 at 21:32 UTC ( #892282=note: print w/ replies, xml ) Need Help??


in reply to Sharing XS object?

I'm by no means a thread expert, but it is my understanding that any data shared by means of threads::shared simply gives each thread a deep copy of the array, hash, or object. The sharing is by way of automated copying between threads rather than all threads sharing some common address space. If memory consumption is a concern, this is probably not what you want.

You have a few options, though none of them simple.

C-library managed object. One option would be to have your XS module manage the memory for the object. When your Perl program queried the library for the object, it would get back a pointer to an object held in a memory space managed by your library (i.e. your .so/.dll ). As a pointer is merely a number, the only Perl data structure involved is a simple scalar storing the memory address. Even if threads::shared is copying data, it will only be copying a small scalar not a whole huge object. Note: I'm told by Fletch in the CB that mod_perl manages some of its data that way - so you could look there for examples.

The downside of this approach is that you will have to manually track reference counts and explicitly notify the C library when it has no more need for the object.

If that happens naturally when all the threads die, it should be easy to do. If you need something more fine grained like knowing when a thread is done using the object somewhere in the middle of its run life, you'll essentially be writing a homegrown memory manager. If there is a perl module on C-PAN that does what you need, it is well worth spending the time looking for it. If not, either abandon this approach or expect to spend a lot of time testing and debugging this. If you aren't already familiar with all the tools for diagnosing memory leaks, you'll also need to budget time to learn those so you can test properly.

Server-client threads. A second possibility is to use one thread to manage and store data and have other threads access the data through acccessor methods. It might look something like the code below. Please note though: even though the code is working code, it probably needs to be cleaned up a bit. I've taking care of the worst of the deadlock situations, but I'm sure I've missed a few.

Communication between the client and server threads is handled using Thread::Queue objects. As might be expected, only plain scalars (not references) can be placed in the queue.

To make it possible to pass more complex data, the server converts any data it returns to the client into string form using YAML. The client converts it back to an actual object, reference or scalar. I've only done this with return values. However, in a real implementation, you would likely need to do this conversion for all parameters as well.

If all this feels like a lot of work and extra processing just to save memory, the simple answer is: it is. Optimizations for the sake of memory conservation almost always increase CPU consumption and vice versa.

Without further ado, the code:

use strict; use warnings; use threads; use threads::shared; use Thread::Queue; use YAML (); #======================================================== # Skeleton class - pretend this is your data #======================================================== { package MyData; sub new { my $sClass= shift; bless({@_}, $sClass); } sub getProperty { my ($self, $sProperty) = @_; return $self->{$sProperty}; } sub setProperty { my ($self, $sProperty, $sValue) = @_; return $self->{$sProperty} = $sValue; } } #======================================================== # Server definition #======================================================== # Note: this data will be copied to each thread # However, this does NOT mean that $oData object will # be copied. As the output statements show when this # script is run, only the server thread assigns a value # to $oData. All other threads have the value undef. # This is because the server thread creates $oData _after_ # it is launched and has its own separate copy of $oData. my @aClients; my $oRequest = Thread::Queue->new(); my $oAnswer = Thread::Queue->new(); my $oData; # This variable is shared so that the server can see changes # made to this value my $bServerAlive :shared; #--------------------------------------------------------- sub addClient { my $crRun = shift; my $aArgs = \@_; my $t = async { $crRun->(@$aArgs); removeClient(threads->self); }; push @aClients, $t; } #--------------------------------------------------------- sub removeClient { my $tid = $_[0]->tid(); for my $i (0..$#aClients) { my $t = $aClients[$i]; next unless $t->tid() == $tid; $t->detach(); splice @aClients, $i, 1; return 1; } return 0; } #--------------------------------------------------------- sub startServer { $bServerAlive=1; threads->create(\&serveData)->detach(); } #--------------------------------------------------------- sub shutdownServer { $_->join() foreach @aClients; $bServerAlive=0; } #--------------------------------------------------------- sub serveData { my $sMethod = shift; my $tid = 'server'; # crate the object if it isn't created already if (!defined($oData)) { print STDERR "$tid: Creating data server data\n"; $oData = MyData->new(A => 1, B => 2, C => 3); } # quit when we've reached the maximum number of request POLL_STATUS: while ($bServerAlive) { # Note: Perl does not have an unlock command. Instead there is # an implicit unlock when we leave this block due to return # or die # grab method and parameters my ($sMethod, @aArgs); { lock($oRequest); if (!$oRequest->pending()) { #print STDERR "$tid: No requests.... yielding\n"; threads->yield(); next POLL_STATUS; } print STDERR "$tid: retrieving request\n"; $sMethod = $oRequest->dequeue(); while ($oRequest->pending()) { push @aArgs, $oRequest->dequeue(); } } print STDERR "$tid: Excuting call $oData->$sMethod(" , join(',', map {$_?$_:'undef'} @aArgs), ")\n"; # call method and convert result into a string my $crMethod = $oData->can($sMethod); if (!defined($crMethod)) { warn "No such method: $sMethod"; next; } my $xResult = YAML::Dump($crMethod->($oData, @aArgs)); print STDERR "$tid: returning result\n"; lock($oAnswer); $oAnswer->enqueue($xResult); cond_signal($oAnswer); } } #======================================================== # Client definition #======================================================== my $bRequestInProgress : shared; sub makeServerRequest { # my ($sMethod, $arg1, $arg2, ...) = @_; # Note: Perl does not have an unlock command. Instead there is # an implicit unlock when we leave this block due to return # or die # locking $bRequestInProgress ensures that only one request # may be made at a time. This thread passes control to the # server thread my $tid = threads->tid; lock($bRequestInProgress); print STDERR "$tid: Asking a question\n"; { lock($oRequest); $oRequest->enqueue($_) foreach @_; } print STDERR "$tid: Waiting for an answer\n"; lock($oAnswer); cond_wait($oAnswer); # when we get an answer convert froms string form to Perl # data structure return YAML::Load($oAnswer->dequeue()); } sub demoClient { my ($sName) = @_; my $tid = threads->tid(); print STDERR "$tid: Hello... I'm $sName\n"; printf STDERR "$tid: I'm a %s(data=%s)\n" , (defined($oData)? ('server', $oData) : ('client', 'undef')); makeServerRequest('setProperty', 'name', $sName); print STDERR "\n"; foreach my $sPropName qw(name A B C) { my $v = makeServerRequest('getProperty', $sPropName); printf STDERR "$tid: $sPropName=$v\n\n"; } } #--------------------------------------------------------- # Demo #--------------------------------------------------------- startServer(); for my $sName (qw(Mama Papa Baby)) { addClient(\&demoClient, "$sName Bear"); } shutdownServer();


Comment on Re: Sharing XS object?
Download Code
Re^2: Sharing XS object?
by BrowserUk (Pope) on Mar 09, 2011 at 22:31 UTC
    If all this feels like a lot of work and extra processing just to save memory, the simple answer is: it is. Optimizations for the sake of memory conservation almost always increase CPU consumption and vice versa.

    That's all very well, but who is going to be happy with the trade-off of your code running 2 orders of magnitude slower than the code you are replacing?

    • A 1 second web request takes 100 seconds.
    • A 10 second utility run takes 15 minutes.
    • A 4-hour, cpu-bound, data-intensive batch process takes over two weeks.

    Especially as for an application using 10 threads and sharing 1000 data items, your code requires 64MB compared with threads::shared's 21MB.

    Optimisation?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I'm not at all clear about the source of your numbers. Did you benchmark code? Do a big-O analysis? I find it hard to believe that my code even unoptimized has a 4-fold increase in total memory consumption over shared variables when only one thread has a copy of $oData and all the other threads request individual bits of data on an as-needed basis.

      The OP did not specify the usage pattern of data in his application, or at least I did not read the post that way. There is nothing there saying that he has very large number of individual items that have to be simultaneously shared between M threads.

      Based on his concern about a tree with an unspecified large number of nodes and a sample client appearing to do a search for a particular node named "foo", I made the assumption that he had in mind a quite different scenario. He has a very large data structure, perhaps 1G of data (before Perl overhead). He has threads that need to select bits and pieces from that data structure, e.g. query for a particular node in his tree. At any one time, in any one scope, each thread maybe needs no more than a handful of items out of that huge data structure, lets say 10. Assuming that those 10 items consume 100bytes each, we are talking about no more than a KB of data required by each client thread. Even without optimizations, I can't possibly see how deep copying 1G of data to each thread (10G total) would be better than 1G held by a server thread and 1K held by 10 client threads (1G+10K total). Even if you argued that all that marshalling meant 4x the amount of memory per data item, you still would only have 1G+40K total. That isn't anywhere near 10G, let alone 40G. What am I missing?

      Usually, if you actually did benchmarking, you post your results in some detail. Here you did not. Or did you mean me to read your numbers in a rhetorical light - if code is 100-fold slower, if code has 4x the memory.... It is unclear to me.

      If your actual point was "Don't be so cavalier about memory-processing time trade-offs because some just aren't worth it.", I agree entirely. It is totally silly to take two weeks to do something, when memory constraints could be solved by buying a few more GB of RAM at 10-120$ a GB depending on quality. However, in many applications, even a 100-fold increase in per-op time is of minimal concern if that op is only a small part of the larger code. Neither of us know what percentage of time the OP is spending querying his tree object relative to other processing he does with whatever data he retrieves.

      My point about the marshalling was not to say that you should tolerate it because that is the price you pay. Rather I meant just the opposite: be really sure memory is a real problem because the software solutions to memory constraints are going to cost you.

      Update: fixed some typos in numbers.

      Update: removed first paragraph - rewrite my post and BrowserUK's and realized he wasn't complaining that my code failed to optimize itself but rather that the whole idea of marshalling was not a tradeoff of memory consumption at the price of CPU.

        I'm not at all clear about the source of your numbers.

        I benchmarked.

        Your code:

        Result: Memory usage: 64.3MB CPU:4:54 minutes.

        threads::shared:

        Results: Memory usage: 21.1MB CPU usage: 7.1 seconds.

        I find it hard to believe that my code even unoptimized has a 4-fold increase in total memory consumption over shared variables when only one thread has a copy of $oData and all the other threads request individual bits of data on an as-needed basis.

        Don't believe; measure.

        ... I can't possibly see how deep copying 1G of data to each thread (10G total) would be better than 1G held by a server thread and 1K held by 10 client threads (1G+10K total). ...

        You are just guessing. I benchmarked.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892282]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (14)
As of 2014-04-16 20:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (433 votes), past polls