Threads and fork and CLONE, oh my!

So lately, I've had to learn a bit about threads in Perl. (Re: Outside-in objects...) In particular, I've learned that the "inside-out" object technique (c.f. Yet Another Perl Object Model, Class::InsideOut, etc.) -- which typically uses stringified $self or else refaddr($self) as the key to storing object properties in a package-scoped lexical hash -- can be fatally flawed when used with threads. Because Perl ithreads clone all data into the new thread, the memory address of the blessed reference changes, dissociating it with the values stored in the property hash. Ugh.

After fooling around with ideas for using a UUID for each object that could be tracked across a thread boundary, I stumbled into rereading perlmod and its description of the CLONE method, which is called once per package right after a new thread is created (and from the context of the new thread). Using this method and a global registry of objects, I was able to migrate object data to be keyed off the new memory location in the new thread. While this doesn't allow sharing objects across threads, it at least preserves existing objects into newly created threads.

Once I got that working, I began to wonder about fork-safety. In my next bit of exploration, I discovered that forking is platform-specific. (Hey, it was news to me, at least.) On a unix derived OS, fork is done using the system fork call, which creates a new process with memory allocated as "copy-on-write" (at least it works this way on Linux, with which I'm most familiar). (While I'm not deep on the internals of it, from what I understand, that means that the same memory locations are used for variables until the value of the variable is changed -- experts please correct or expound if I'm off target.) That seems to work just fine for inside-out objects -- as the reference is preserved (and changing the reference is tantamount to changing the object, anyway).

On Win32, however, forking is faked using threads! (c.f. perlfork) So fork-safety on Win32 means getting thread-safety as well, which means that thread-safety for inside-out objects winds up being rather important, as unsuspecting users might wind up forking their way into threads without even realizing it and finding all their objects have lost their data. (Unfortunately, this detail is completely glossed over in Conway's recent Perl Best Practices, as he only mentions the need for declaring the lexical hashes as shared and ensuring locking occurs on access for thread-safety for inside-out objects.)

I've included below some code samples that show how to use a global registry of objects with CLONE, along with some test files that demonstrate how it works -- albeit only in a very simple case. I've tested it on WinXp (ActiveState) and Linux and it worked as expected. (Code is a bit pedantic for clarity.)

SafeObject.pm:

# A thread-safe inside-out object class
package SafeObject;
use strict;
use warnings;
use Scalar::Util qw( refaddr weaken );

our $VERSION = 0.001;

# Global object tracking and constructor

my %REGISTRY;

# Object property storage and accessor

my %NAME;

sub name {
    my ($self, $value) = @_;
    
    # store a value if one is provided
    my $id = refaddr $self;
    if ( defined $value ) { $NAME{ $id } = $value; }
    
    return $NAME{ $id };
}

# Constructor and destructor 

sub new { 
    my $class = shift;
    my $self = bless {}, $class;

    # store a weak reference in the registry
    my $id = refaddr $self;
    weaken ( $REGISTRY{ $id } = $self );
    
    return $self;
}

sub DESTROY {
    my $self = shift;
    my $id = refaddr $self;
    
    # clean up memory used for the object
    delete $NAME{ $id };
    delete $REGISTRY{ $id };

    return;
}

# Cloning routine called for new threads

sub CLONE {
    # So we can see this called in a Windows fork()
    warn "# Notice: Cloning data in new thread\n";
    
    # fix-up all object ids in the new thread
    # (note: %REGISTRY change in the middle, so don't use "each")
    for my $old_id ( keys %REGISTRY ) {  
        
        # look under old_id to find the new, cloned reference
        my $object = $REGISTRY{ $old_id };
        my $new_id = refaddr $object;

        # relocate data
        $NAME{ $new_id } = $NAME{ $old_id };
        delete $NAME{ $old_id };

        # update the weak reference to the new, cloned object
        weaken ( $REGISTRY{ $new_id } = $REGISTRY{ $old_id } );
        delete $REGISTRY{ $old_id };
    }
    
    return;
}

1; # package must return true
[download]

01-thread.t:

#!/usr/bin/perl
use strict;
use warnings;
use 5.008; # CLONE only supported in Perl > 5.8

use threads;
use Test::More tests => 7;

require_ok( "SafeObject" );

my $safe_obj = SafeObject->new;
isa_ok( $safe_obj, "SafeObject" );

is( $safe_obj->name( "Charlie" ), "Charlie", "mutator returns value"  
+);
is( $safe_obj->name()           , "Charlie", "accessor returns value" 
+);

my $thr = threads->new( 
    sub { 
        is( $safe_obj->name(        ), "Charlie", "got right name in t
+hread");
        is( $safe_obj->name( "Fred" ), "Fred"   , "changed name in thr
+ead"  );
    } 
);
$thr->join;
is( $safe_obj->name(), "Charlie", "main thread still has original name
+" );
[download]

02-fork.t:

#!/usr/bin/perl
use strict;
use warnings;
use 5.008; # CLONE only supported in Perl > 5.8

use Test::More tests => 7;

require_ok( "SafeObject" );

my $obj = SafeObject->new;
isa_ok( $obj, "SafeObject" );

is( $obj->name( "Charlie" ), "Charlie", "mutator returns value"  );
is( $obj->name()           , "Charlie", "accessor returns value" );

my $child_pid = fork;
if ( !$child_pid ) { # we're in the child
        is( $obj->name(        ), "Charlie", "got right name in child"
+);
        is( $obj->name( "Fred" ), "Fred"   , "changed name in child"  
+);
        exit;
}

# wait for child to finish
waitpid $child_pid, 0;

# Test counter is off due to the fork
Test::More->builder->current_test( 6 );

is( $obj->name(), "Charlie", "parent still has original name" );
[download]

As expected, while the 02-fork.t tests pass on both Linux and Windows, on Windows we get the "# Notice: Cloning data..." warning, showing that the fork() is actually creating a new thread.

I still think that an alternative approach, storing a UUID within a blessed scalar, would be a reasonable approach, and might even facilitate sharing inside-out objects across threads (again, storing them in a registry and locking the UUID within the registry to control access). However, one of the nice features of inside-out objects keyed off of a memory address is that it's possible to transparently subclass other objects that use traditional blessed data structures to store their data. That capability would be lost using a blessed scalar to store a UUID. I'm not sure whether sharing objects across threads or flexible subclassing is a more-important feature.

Fellow monks, as I'm only starting down this path of inside-out objects and threads and forking, I'd appreciate your perspectives on this problem, the solution I've laid out above and, in particular, any other details that should be considered as this scales up beyond a simple test case.

Thanks,

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Comment on Threads and fork and CLONE, oh my! Select or Download Code

Replies are listed 'Best First'.
Re: Threads and fork and CLONE, oh my! by Zaxo (Archbishop) on Aug 12, 2005 at 03:33 UTC
Your description of Linux fork's copy-on-write of the child's environment is correct. That is one of tricks Linux uses to make its fork very fast and economical. It's good enough that there is rarely a reason to use threads on Linux, though they are supported by the kernel. SYSV shared memory support is available, but most people prefer pipes or sockets for IPC on Linux. After Compline, Zaxo	[reply]
Re^2: Threads and fork and CLONE, oh my! by Anonymous Monk on Nov 15, 2005 at 14:03 UTC
What was wrong with the UUID proposal? It sound like you abandoned that idea. I interpret it to mean create a scalar ref whose object is a unique number. Then, the dereferenced value, not the volatile address, becomes the object key. The UUID proposal would have much less overhead than having a CLONE method repair all the dangling links in each hash.	[reply]
Re^3: Threads and fork and CLONE, oh my! by Zaxo (Archbishop) on Nov 15, 2005 at 14:26 UTC
I didn't spot anything wrong with it; I just didn't address that part of the problem. After Compline, Zaxo	[reply]
Re^4: Threads and fork and CLONE, oh my! by esharris (Monk) on Nov 15, 2005 at 16:54 UTC
Re^3: Threads and fork and CLONE, oh my! by esharris (Monk) on Nov 15, 2005 at 14:08 UTC
I'm the anonymous monk.	[reply]
Re: Threads and fork and CLONE, oh my! by perrin (Chancellor) on Aug 12, 2005 at 04:26 UTC
Frankly, this is exactly the sort of trouble I would expect when using a crazy scheme involving refaddr. It's hard to see how the possible benefits of this unusual data storage approach could be worth the risk.	[reply]
Re^2: Threads and fork and CLONE, oh my! by xdg (Monsignor) on Aug 12, 2005 at 11:09 UTC
The benefits of inside-out/flyweight objects have been beat to death on this board and center primarily on the stronger encapsulation of data and the orthogonality to potential property name clashes with super/subclasses. However, the approach needs a unique ID -- and, for reasons lost to history, someone used "$self" (hey, it's unique, right?) as cheaper than generating a unique ID and from there to just the memory address part, and the cargo cult followed. I think the fundamental storage techinque is sound and using a UUID would fix up the refaddr problem -- though as I said, at the cost of coupling superclasses/subclasses more tightly. It's fine if everything in the class hierarchy is built the same way (e.g. on a blessed scalar with a UUID inside), but one loses the ability to subclass someone else's class (e.g. on CPAN) without caring what kind of blessed reference they used (hash, array, etc.) or whether it changes in some future version. For some, that may be a bigger benefit. I'd still like to hear people's view on that topic -- whether that is important enough to justify the extra complexity of CLONE . I'd also like to get people's views on whether adding external dependencies on Data::UUID and/or Win32API::GUID are worthwhile or whether some other inline, pure-Perl"unique id" algorithm is preferable (with say, Time::HiRes, process ID, hostname/IP, etc. -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re^3: Threads and fork and CLONE, oh my! by adrianh (Chancellor) on Aug 17, 2005 at 15:08 UTC
for reasons lost to history, someone used "$self" (hey, it's unique, right?) as cheaper than generating a unique ID Unfortunately it turns out that it isn't very cheap at all (speed wise). In fact it's just about the worse possible choice :-) On my perl 5.8.7 this basic benchmark: Read more... (5 kB) Gives me: BlessedHash x 10000 = 2265688 bytes ClassStd x 10000 = 2222948 bytes NumSelfAsIndex x 10000 = 2219534 bytes RefaddrCached x 10000 = 2300888 bytes RefaddrCall x 10000 = 2226816 bytes SelfAsIndex x 10000 = 2436816 bytes Rate SelfAsIndex ClassStd NumSelfAsIndex RefaddrCall +RefaddrCached BlessedHash SelfAsIndex 1000/s -- -9% -12% -44% + -57% -59% ClassStd 1100/s 10% -- -3% -38% + -53% -55% NumSelfAsIndex 1131/s 13% 3% -- -36% + -52% -54% RefaddrCall 1778/s 78% 62% 57% -- + -24% -27% RefaddrCached 2349/s 135% 114% 108% 32% + -- -4% BlessedHash 2443/s 144% 122% 116% 37% + 4% -- [download] with a plain $self index coming in a lot worse than the faster alternatives.	[reply] [d/l] [select]
Re^4: Threads and fork and CLONE, oh my! by xdg (Monsignor) on Aug 17, 2005 at 15:40 UTC
Re^4: Threads and fork and CLONE, oh my! by jdhedden (Deacon) on Sep 16, 2005 at 17:57 UTC
Re^5: Threads and fork and CLONE, oh my! by xdg (Monsignor) on Sep 16, 2005 at 18:26 UTC
Some notes below your chosen depth have not been shown here
Re^5: Threads and fork and CLONE, oh my! by adrianh (Chancellor) on Sep 18, 2005 at 08:28 UTC
Re^5: Threads and fork and CLONE, oh my! by demerphq (Chancellor) on Nov 15, 2005 at 14:41 UTC
Re^3: Threads and fork and CLONE, oh my! by dragonchild (Archbishop) on Oct 03, 2005 at 02:42 UTC
It's fine if everything in the class hierarchy is built the same way (e.g. on a blessed scalar with a UUID inside), but one loses the ability to subclass someone else's class (e.g. on CPAN) without caring what kind of blessed reference they used (hash, array, etc.) or whether it changes in some future version. For some, that may be a bigger benefit. That's going to be a problem no matter how you represent your object in memory. The only way around that is if you give over generating new attributes (and accessors for said attributes) to some other entity that will then do it in the same manner for all classes in the hierarchy. This fact, btw, is the biggest win for P6 OO. The method of implementation is less important than the fact of implementation. There is now some arbiter of attribute/accessor generation that will do it the same way every time. It will also resolve clashes in some sane and user-definable manner. Beyond that, it's all gravy. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply]
Re^2: Threads and fork and CLONE, oh my! by Joost (Canon) on Aug 12, 2005 at 11:21 UTC
There are definitely advantages to using inside-out / flyweight objects: they're light-weight, relatively simple to implement and give you private, per-package properties (no more worries about inheriting classes messing up your properties, yay!). Now basing the flyweight "key" on the object's ref-address also makes it much harder to accidentally modify the key, you don't need any seperate algorithm to generate unique keys, and since all objects have a ref-address, you can inherit from any "normal" class with an inside-out subclass and vice-versa (vice-versa provided the top-class propagates DESTROY). Now, there is only a problem with this approach if you use ithreads. I don't know about you, but I've not used ithreads except as a toy; for me, I'm not sure that the benefits of ithreads are worth the risk :-) It's not like they're completely devoid of bugs and caveats. Update: I completely glossed over the win32 fork() problems. I didn't even know that a fork() on win32 would modify your ref-address (I use win32 only when I really have to). Still, I can imagine situations where you'd want the above code, but I'd use a special-purpose flyweight key if I'd have a choice in the matter. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re^3: Threads and fork and CLONE, oh my! (SEP) by tye (Sage) on Aug 12, 2005 at 23:32 UTC
IMO, this is just one more way iThreads and Perl's fork emulation on Win32 are fundamentally broken. Given that they are already "the worst of both worlds" and buggy, my reaction to this realization would be just to mark any code I have that uses 0+$ref (or refaddr) as "Not supported with iThreads nor with Win32 fork()". - tye	[reply]
Re^3: Threads and fork and CLONE, oh my! by adrianh (Chancellor) on Aug 17, 2005 at 15:04 UTC
they're light-weight If we're talking memory usage then they're not significantly more lightweight than normal blessed hashes on recent perls.	[reply]
Re: Threads and fork and CLONE, oh my! by nothingmuch (Priest) on Aug 16, 2005 at 18:15 UTC
Firstly: perl 5's threads are in a pretty grim shape - they're slow, they're quirky, and lots of stuff isn't as easy as it should be. I would recommend against using them if you can find another alternative. Second: since you asked for an explanation of how fork is efficient, here's how it works: Processors have real and protected addressing modes. Real mode is what people would normally expect - every pointer points to an address in memory. Protected mode is what is actually used 99% of the time in practice. Under protected mode, fetching a memory address is an indirect operation - the address is translated from the virtual address to a real one by the MMU (the memory management unit). This is sensitive to the current process, and the way the notion of the current process is defined varies from processor to processor. Memory is handled by the MMU in chunks called pages, which are typically 4kb long. These are the smallest unit the MMU will take care of in it's virtual/real addressing scheme. When a process forks (under true fork, not vfork which I will shortly discuss) all of it's memory is set to be read only, and the process itself is duplicated, returning different values to each process. All the At this point almost no data is updated - the pages themselves are marked read only, but this is quick and trivial, and the process handle in the kernel is duplicated, and this is not a large structure, in comparison to the amount of memory a process can actually consume. Whenever the processor tries to store a value in a read-only memory page the MMU raises a fault, which the kernel has to handle. The kernel will ask the MMU to copy the page, to a new location in virtual memory (all of the physical memory and the swap space), but makes the mapping appear, to the process, to be at the same memory location. When the memory page has been copied, it is safe to actually write to it. By keeping a reference count of the number of processes sharing each page you can also unlock read-only pages when they are accessible by only one process. vfork is a version of fork that is suited for fork and exec sequences... When you call 'vfork' nothing is copied - instead the child process has complete control of the parent's resources (file descriptors, memory, etc), until it calls exec to load another process image into memory. When the child process finally executes the new program the parent process is resumed. The reason the parent is suspended while the child is running is that since the addressing space is shared, so is the stack. Since the stack (together with the instruction pointer) represents the state of the process, including the return value from 'vfork' (and any called function for that matter), this value cannot be both 0 for the child and the child's process ID for the parent at the same time. Slightly off topic but nevertheless interesting is the page swap system of an OS with virtual memory - when a page is resolved by the MMU, and it isn't in physical memory, a page fault is sent to the kernel. The kernel is then responsible for loading the memory page from disk, by swapping it with a physical memory page. When the data has been loaded to physical memory the access to the memory can finish. Some notes: Modern, complex MMUs can probably do copy-on-write for the kernel without raising a fault vfork used to be a replacement for fork when copying the whole process was unnecessary but still costly. Since the process is just going to exec a new process image, there is no point in copying everything and then throwing it away. plain old fork() should be almost as efficient, more responsive, and probably safer too. vfork might have other semantics on a platform that isn't MacOS X, i'm too lazy to check Update: Ven'Tatsu cought a wordo confusing protected mode with real mode. -nuffin zz zZ Z Z #!perl	[reply]
Re: Threads and fork and CLONE, oh my! by jdhedden (Deacon) on Sep 09, 2005 at 17:37 UTC
xdg's CLONE'ing mechanism works well. I have used it in the CPAN module Math::Random::MT::Auto. I had considered trying to convert my module to use Class::Std, but I'll have to wait until TheDamian updates it to use xdg's concept. Remember: There's always one more bug.	[reply]


Think about Loose Coupling
	PerlMonks