Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: Passing globs between threads

by BrowserUk (Patriarch)
on Oct 01, 2004 at 22:22 UTC ( [id://395772]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Passing globs between threads
in thread Passing globs between threads

Yes it's easy to synchronize objects (at least Storable ones) across threads. RFC677 (yes the one from IETF) tells you how

Sorry, but I completely disagree. What RFC 677 says is that "...the problem of maintaining duplicated databases in an ARPA-like network...." is possible; Not easy!

However, databases store data. In some cases code in the form of stored procedures. There is a world of difference between this and a perl object.

The main one being that stored procedure code only has access to data stored with it in the DB, and constants. Perl object methods can have access to data that exists outside of the object--through references; lvalue refs; lvalue subroutines; coderefs; closures; global variables; and probably others that I haven't thought of.

Perl code (methods) can also be created, modified, deleted and overridden through introspection. Stored procedures cannot.

When you create an object using bless you tie (in the non-Perl sense of the word) the data (attributes) of that instance to a specific vtable of methods that exists within that same thread. When you duplicate this instance through freeze/thaw, the duplicated data is tied to a duplicate vtable.

Whilst applying appropriate sharing (semaphores) and deploying appropriate locks can allow you to coordinate changes to the data across threads. Applying the same coordination to changes to the vtable is fraught with problems if not impossible.

Then there is the problem of class data. Re-creating an instance of a class, does not re-create the class environment for that instance. That is to say, any class data that the instance methods might refer to--eg. Inside-out object hashes--just isn't referenceable nor duplicable.

How would you coordinate an iterator returned by a method that used used closures to track state?

This is a far from exhaustive rebuttal. Far, far, far...from exhaustive. However, perhaps I am missing the solution, so here's my reply:

If it is so easy, show me the code.

Show me the modifications required by this simple code to allow me to share an instance of Chipper such that I can pass a copy of an iterator to two threads and have them concurrently process chars from the string in a coordinated manner?

#! perl -slw use strict; package Chipper; my %pos; sub TIESCALAR { my( $class, $string ) = @_; my $self = bless \$string, $class; $pos{ $self } = 0; return $self; } sub FETCH { my( $self ) = @_; return $$_[ 0 ]; } sub STORE { my( $self, $value ) = @_; $pos{ $self } = 0; return $$self = $value; } sub chip { my( $self ) = @_; $pos{ $self } = 0, return undef if $pos{ $self } >= length $$self; return substr $$self, $pos{ $self }++, 1; } 1; package main; my $inst1 = tie my $str1, 'Chipper', 'The quick brown fox jumps over the lazy dog'; my $inst2 = tie my $str2, 'Chipper', 'Now is the time for all good men to come to the aid of the party' +; print "($a$b)" while $a = $inst1->chip and $b = $inst2->chip;

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^6: Passing globs between threads
by Anonymous Monk on Oct 02, 2004 at 00:27 UTC
    Ooops! Have you looked at the RFC itself or only at the title? "Database" in the RFC means roughly the same as "Hash" in Perl (see the date of the RFC). And with easy I meant straightforward programming. No special cases, no race conditions. The amount of code is not typed between two messages. If it were so, I had said "trivial" not "easy".

    But I can give an outline:

    • Every thread has a replica
    • The thread querys its own replica by posting a query into its own input-queue, not by direct access
    • The thread writes its own replica by posting a modification request into its own input-queue, not by direct access
    • If a thread waits or wishes to access its replica, it dequeues all requests from its input-queue and works the algorithm in RFC677
    • Any modification requests arising from this Algorithm are queued into the input-queues of the appropriate threads.
    • If any keys so modified are typed as Perl-source, the thread recompiles its own subs
    Because no replica is ever modified or seen by another thread as its own, no locking is necessary.

    Of course all data referenced by data in the "Database" must be replicated themselves and stored in the "Database".

    You see that I can't modify your example, because I had to write a module for the RFC677-Algorithm.

      Update: I think I know what I (and AnonymousMonk?) were missing. It is this section of the RFC:

      Note that value modification is limited to assignment. Functional modification requests - such as "Change X to be Factorial(X)" - are specifically ruled out. Allowing them would force the use of system wide synchronization interlocks.

      What this means (if I interpret it correctly) is that the database only stores values and retrieves them. Any new value applied is simple a new value, it can have no relationship to the previous value!

      Which makes it totally inapplicable for the purpose of sharing objects between threads.


      Your right, I didn't read it thoroughly, just scanned it a couple of times. Like most RFCs, I find the language chosen--probably for very real reasons of avoiding associations with any particular programming language, OS or other pre-existing system--makes for extremely tedious and difficult reading.

      However, given your precis, I have given it another read and ... well, I still not convinced. Certainly not as a realistic mechanism for object sharing in Perl(5?). I'll try to outline why.

      Two threads have access to a single shared scalar X. Each thread needs to increment X by 1. The sequence of operations required by each thread to do this is:

      1. Enqueue (Selector) request:

        Selector: X; (DBMP)ID: n; SequenceNo: s;

      2. Process all messages in the Q, in sequenceNo order,

        until the just queued message is top of the Q.

      3. Return the current value of X to the calling code.
      4. Add 1 to the value returned.
      5. Enqueue (Assignment) request:

        Selector:X; ID:n; SequenceNo:s+m; Value: v;

      6. Process all messages in the Q, in sequenceNo order,

        until the just queued message is top of the Q.

      7. Enqueue update (Assignment) message other thread.

      Now look at (one of) the different sequences in which these action can occur on two sequentially run, pre-emptive threads. At the start of the following assume that both threads copies of the DB are synchronised and contain one selector X with a value of 2. The action to be performed by both threads is to increment the shared variable X by 1.

      Thread 1 Thread2 X1=2 X2=2 ================================================================= Step A -> "S:X:1:1" Step B -> 1 message (ours) to process. Step C -> return 2 ------------------------------------------- Timeslice Step A -> "S:X:2:4" Step B -> 1 msg to process -> "S:X:2:4" Step C -> return 2 ------------------------------------------- Timeslice Step D -> 2+1 = 3 Step E -> "A:X:1:7:3" Step F -> X1=3 -> Enqueue "A:X:1:9:3" -> T2 ------------------------------------------- Timeslice Step D -> 2+1=3 Step E -> "A:X:2:11:3" Step F -> 2 msgs to process -> "A:X:1:9:3" -> X2=3 -> "A:X:2:11:3" -> X2=3 !!!Bang!!! =================================================================

      At point !!!Bang!!!, both threads think they have incremented X, but both threads have a value for 3 for X?

      Maybe I'm being thick tonight (always?), but I don't read anything explicit or implied in the RFC that deals with this overlap?

      Even if I am missing something, and the RFC does deal with this (which I think it must but I don't see how?), then if every simple increment or decrement of a shared value is going to require the (minimum) 7 steps I've outlined above (skiping over that:

      • each transmission to another thread requires positive confirmation;
      • the number of transmission/wait for receipt cycles grows with each additional sharing thread;
      • that your statement that "Of course all data referenced by data in the "Database" must be replicated themselves and stored in the "Database"." means that every value in every thread must be shared between every thread (including all code) in order to deal with closures; globals; lvalues etc.
      )

      if this was implemented, I think that the phrase that comes to mind to describe it is "slow as molasses".

      I know I missing something vital here--but what?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
        What you miss here is: We are talking yet only about sharing Storable objects between threads. Closures and lvalues are NOT Storable. And your problem with incrementing shared values isn't new,it's in every threading library. That's why mutexes exist. A mutex in this application could be simply a message "Sequence nrs from A upto B are owned by Thread C". If this message "modifies" key A||B and keys A...B are "modified" with the value A||B (all at seqnr A) then thread C could after querying with seqnr B key A...B be sure it could transact the whole increment using seqnrs A...B. The query with sequence number should result in the value this key had at the time of this sequence number or an error if impossible to say (meaning the lock failed).

        BTW a mutex is not Storable.

        And no: Only objects referenced FROM the Database must be stored, not anythig that REFERENCES objects in the Database. That restriction applys to Storable too, so it's not outrageous.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://395772]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-25 06:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found