Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^3: Passing globs between threads

by BrowserUk (Patriarch)
on Oct 01, 2004 at 00:21 UTC ( #395513=note: print w/replies, xml ) Need Help??


in reply to Re^2: Passing globs between threads
in thread Passing globs between threads

The first thing to realise is that raw filehandles and sockets are process global. So you don't need, and indeed cannot, share them between threads in the threads::shared sense, as they are effectively already shared by all threads in the process.

Note: I'm talking about the OS & C-runtime concept of filehandles and sockets, not anything in the IO::* group of modules. These are peculiar beasts in that they are at some level like ordinary perl objects, which makes them difficult, if not impossible to use safely across threads, but they do not behave entirely like ordinary objects.

So, the problem is how to transfer an open handle between threads (which at a higher level seems like a bad design to me, but I'll get back to that), given that threads::shared won't let you share a GLOB nor even a 5.8.x lexical scalar that is currently being used as a GLOB-like entity.

After a little kicking around, I found a way. I'm not yet sure that it is a good thing to do, but I'll show you how anyway and hope that if someone else out there knows why it should not be done, they'll speak up.

The trick is to pass the fileno of the open GLOB to the thread and then use the special form of open open FH, "&=$fileno" or die $! to re-open the handle within the thread before using it.

#! perl -slw use strict; use IO::File; use threads; use Thread::Queue; sub thread{ my $Q = shift; my $fno = $Q->dequeue; open FH, "<&=$fno" or die $!; print for <FH>; return; } my $Q = new Thread::Queue; my $thread = threads->create( \&thread, $Q ); my $io = IO::File->new( 'junk', 'r' ) or die $!; $Q->enqueue( fileno $io ); $thread->join; $io->close; __END__ P:\test>395373 This is junk and some more junk and yet more junk this is line 4 of junk

However, this is not a complete solution. Currently, probably because of a bug I think, but possibly by design, if you actually read from the handle in the thread where you opened it, then pass the fileno to another thread, reopen it and attempt to read some more from it, it fails. No errors, no warnings. Just no input.

I've read and re-read perlfunc:open and perlopentut:Re-Opening-Files-(dups) , and I believe that the "&=$fileno" syntax should allow this...but it doesn't. If you need to be able to do this, you'll need to take it up with p5p guys and get their wisdom.

Now, getting back to design. Whilst moving objects around between threads by freezing and thawing them is possible, it feels awefully hooky to me. Basically, if your design is done correctly, there should be no need to create duplicates of objects in different threads.

Doing so is opening yourself up to a world of greif.

These duplicate objects will be uncoordinated. Once you freeze an object, it is exactly that; frozen. Any subsequent changes you make will not be reflected in the copy that you reconstruct elsewhere. If it was trivial, or even if it was possible with a reasonable degree of difficulty to coordinate and synchronise objects across threads, Perl would do that for you. The fact that the clever guys that got iThreads this far did not step up to the plate and do this already, almost certainly means that you should not be trying to do this either.

The fact is that I appear to have done as much coding of(i)threads as anyone I am aware of, and I have yet to see a usefully iThreadable problem I couldn't (almost trivially) solve. And so far, I have never needed to share either objects (or IO handles) between threads.

But don't take my word for it. My knowledge only extends as far as I have tried to go. It would be good to see someone else pushing the boundaries of what's possible.

If you could describe the problem that you are trying to solve at the application level, I'd be most interested to take a look.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^4: Passing globs between threads
by conrad (Beadle) on Oct 01, 2004 at 10:36 UTC
    Hi again & thanks for the detailed reply..

    What I'm trying to do.. A service which will do "stuff" (it's a generic architecture to be customised to particular applications, so I really mean stuff) to bundles of streams - essentially, a client will tell it "here's a bunch of input streams, and a corresponding bunch of output streams, process 'em using some-mechanism-or-other as you pump data from the inputs to the outputs". Each group of inputs and outputs are collated in some way for purposes of error correction, and the service will process multiple bundles simultaneously. Little example diagram here, with two bundles: the first (A->B) is unidirectional with three inputs and only two outputs, while the second (C<->D) is bidirectional with one stream at one end and three at the other (the actual stuff that multiplexes/demultiplexes doesn't really matter for this question).

    A0--\ /B0 A1---+->stuff>--+ Bundle 0 A2--/ \B1 /--D0 C0---<stuff>--+---D1 Bundle 1 \--D2 ... Bundle N ...

    I'm using Perl 'cos it facilitates rapid prototyping (this is research work), it has good network handling, and is easy to integrate with CGI, web services, and third-party applications (all of which are desirable in this case).

    The architecture I've gone with, due to the relatively heavy weight of Perl's threads, is to set up a single worker thread which processes all of the bundles in a select() loop, and then have the main thread obtain, validate and submit control directives (such as adding new bundles or modifying existing ones). In an ideal world, I'd just turn each control directive into a closure and hand it off to the worker thread to deal with in its own good time. In another language that might be easy while all the other stuff I've mentioned might be hard; in Perl, the other stuff is easy and this, while not hard to hack (I'm already doing it by passing filenos across) seems hard to do nicely.

    The reason for separating the control invocation from the worker thread is that I envisage a variety of different control styles, e.g. web page, web service interface, and straight program API. I'd rather not get that all munged up with the I/O and processing work...

    So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker. It's clearly not efficient, but my purpose for the time being is clarity and flexibility, which whole-object transmission gives me. Passing instructions through as unblessed shared string/list/hash structures was more error-prone.

    Now on to the file descriptor messaging: at the moment I do one of two things: either I pass across coordinates (such as "hostname:port") and the worker initiates a socket connection itself, or I pass across the file descriptor of a socket/pipe/filehandle and the worker tries to reconstruct the original object using IO::xxx->new_from_fd(). What would be nice would be to be able to just pass across IO::xxx objects; since that's not possible without extending Storable (I am tempted, would make everything tidy again), the necessary thing seems to be to deconstruct the objects into class (necessary since I can usefully half-close socket connections but not files for example) and file descriptor. Your open("&=") trick is analogous to the new_from_fd one, and difficult because it too needs to know the opening mode of the original object (you have to specify "<&=", ">>&=", etc.). So I guess my ultimate questions are:

    1. Is there a better (read: clean and object-oriented) way of telling another Perl thread about an IO::Handle than deconstructing it to its file descriptor, transmitting that, and then reconstructing it in the receiving thread? I think that the answer to this is no.
    2. If not, then given a file descriptor $fd, is there no better way of reconstructing it accurately into an object than examining the entrails of fcntl($fd, F_GETFL) (to obtain R/W/append flags) and passing my conclusions from that back to IO::xxx->new_from_fd? It seems faintly wasteful that new_from_fd (and fdopen) need this information when it seems to be embedded in the file descriptor already..
      So I escape your wise warning re: freezing objects because the control thread destroys its copies of them as soon as they're frozen and passed off to the worker.

      In effect, you are not sharing an object between threads, you are actually just passing an object template that you want to be construct by the receiving thread. The only benefit you are gaining from doing it this way is the encapsulation of the objects class into the queued (Storable) string. Though I guess it does allow you to build the object instance by calling multiple methods prior to freezing it, which means that the object is validated before you pass it.

      I think that passing a hash or an array containing the parameters to be used in the constructor in the destination thread--along with some convention of passing the class of the object as a named parameter in the hash or as the first element of the array would be just as clean, and probably somewhat more efficient. For one thing it would allow you to avoid loading the code for every class of object into the main thread as well as every receiving thread.

      Then again, I guess it does simplify the re-construction at the receiving end. Storable does all the work for you. And I haven't benchmarked it, and so as long as your not trying to use the objects concurrently from multiple threads, it should be fine.

      On the filehandle stuff, using a filemode of 'r+' and/or '+< &=$fileno', appears to let you read and write to the file from either thread, but it's not quite right. I'm convinced that the semantics of using the "&=$fileno" open is screwed up somehow--at least when combined with threads--but I'm not sure that I understand what the semantics should be in non-threaded code, so it's difficult to tell. It could just be another "not quite POSIX behaviour" win32 thing? Sorry, but I can't be much help there.

      It might be worth taking up the problem with as a Storable/IO::* limitation with the p5p guys. Their greater understanding may see the reason/cause for it, and they may be able to suggest something?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re^4: Passing globs between threads
by Anonymous Monk on Oct 01, 2004 at 20:09 UTC
    If it was trivial, or even if it was possible with a reasonable degree of difficulty to coordinate and synchronise objects across threads, Perl would do that for you. The fact that the clever guys that got iThreads this far did not step up to the plate and do this already, almost certainly means that you should not be trying to do this either.

    Yes it's easy to synchronize objects (at least Storable ones) across threads. RFC677 (yes the one from IETF) tells you how.

      Yes it's easy to synchronize objects (at least Storable ones) across threads. RFC677 (yes the one from IETF) tells you how

      Sorry, but I completely disagree. What RFC 677 says is that "...the problem of maintaining duplicated databases in an ARPA-like network...." is possible; Not easy!

      However, databases store data. In some cases code in the form of stored procedures. There is a world of difference between this and a perl object.

      The main one being that stored procedure code only has access to data stored with it in the DB, and constants. Perl object methods can have access to data that exists outside of the object--through references; lvalue refs; lvalue subroutines; coderefs; closures; global variables; and probably others that I haven't thought of.

      Perl code (methods) can also be created, modified, deleted and overridden through introspection. Stored procedures cannot.

      When you create an object using bless you tie (in the non-Perl sense of the word) the data (attributes) of that instance to a specific vtable of methods that exists within that same thread. When you duplicate this instance through freeze/thaw, the duplicated data is tied to a duplicate vtable.

      Whilst applying appropriate sharing (semaphores) and deploying appropriate locks can allow you to coordinate changes to the data across threads. Applying the same coordination to changes to the vtable is fraught with problems if not impossible.

      Then there is the problem of class data. Re-creating an instance of a class, does not re-create the class environment for that instance. That is to say, any class data that the instance methods might refer to--eg. Inside-out object hashes--just isn't referenceable nor duplicable.

      How would you coordinate an iterator returned by a method that used used closures to track state?

      This is a far from exhaustive rebuttal. Far, far, far...from exhaustive. However, perhaps I am missing the solution, so here's my reply:

      If it is so easy, show me the code.

      Show me the modifications required by this simple code to allow me to share an instance of Chipper such that I can pass a copy of an iterator to two threads and have them concurrently process chars from the string in a coordinated manner?

      #! perl -slw use strict; package Chipper; my %pos; sub TIESCALAR { my( $class, $string ) = @_; my $self = bless \$string, $class; $pos{ $self } = 0; return $self; } sub FETCH { my( $self ) = @_; return $$_[ 0 ]; } sub STORE { my( $self, $value ) = @_; $pos{ $self } = 0; return $$self = $value; } sub chip { my( $self ) = @_; $pos{ $self } = 0, return undef if $pos{ $self } >= length $$self; return substr $$self, $pos{ $self }++, 1; } 1; package main; my $inst1 = tie my $str1, 'Chipper', 'The quick brown fox jumps over the lazy dog'; my $inst2 = tie my $str2, 'Chipper', 'Now is the time for all good men to come to the aid of the party' +; print "($a$b)" while $a = $inst1->chip and $b = $inst2->chip;

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
        Ooops! Have you looked at the RFC itself or only at the title? "Database" in the RFC means roughly the same as "Hash" in Perl (see the date of the RFC). And with easy I meant straightforward programming. No special cases, no race conditions. The amount of code is not typed between two messages. If it were so, I had said "trivial" not "easy".

        But I can give an outline:

        • Every thread has a replica
        • The thread querys its own replica by posting a query into its own input-queue, not by direct access
        • The thread writes its own replica by posting a modification request into its own input-queue, not by direct access
        • If a thread waits or wishes to access its replica, it dequeues all requests from its input-queue and works the algorithm in RFC677
        • Any modification requests arising from this Algorithm are queued into the input-queues of the appropriate threads.
        • If any keys so modified are typed as Perl-source, the thread recompiles its own subs
        Because no replica is ever modified or seen by another thread as its own, no locking is necessary.

        Of course all data referenced by data in the "Database" must be replicated themselves and stored in the "Database".

        You see that I can't modify your example, because I had to write a module for the RFC677-Algorithm.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://395513]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2023-02-07 08:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer not to run the latest version of Perl because:







    Results (38 votes). Check out past polls.

    Notices?