Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^7: Passing globs between threads (Updated).

by BrowserUk (Patriarch)
on Oct 02, 2004 at 04:56 UTC ( [id://395824]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Passing globs between threads
in thread Passing globs between threads

Update: I think I know what I (and AnonymousMonk?) were missing. It is this section of the RFC:

Note that value modification is limited to assignment. Functional modification requests - such as "Change X to be Factorial(X)" - are specifically ruled out. Allowing them would force the use of system wide synchronization interlocks.

What this means (if I interpret it correctly) is that the database only stores values and retrieves them. Any new value applied is simple a new value, it can have no relationship to the previous value!

Which makes it totally inapplicable for the purpose of sharing objects between threads.


Your right, I didn't read it thoroughly, just scanned it a couple of times. Like most RFCs, I find the language chosen--probably for very real reasons of avoiding associations with any particular programming language, OS or other pre-existing system--makes for extremely tedious and difficult reading.

However, given your precis, I have given it another read and ... well, I still not convinced. Certainly not as a realistic mechanism for object sharing in Perl(5?). I'll try to outline why.

Two threads have access to a single shared scalar X. Each thread needs to increment X by 1. The sequence of operations required by each thread to do this is:

  1. Enqueue (Selector) request:

    Selector: X; (DBMP)ID: n; SequenceNo: s;

  2. Process all messages in the Q, in sequenceNo order,

    until the just queued message is top of the Q.

  3. Return the current value of X to the calling code.
  4. Add 1 to the value returned.
  5. Enqueue (Assignment) request:

    Selector:X; ID:n; SequenceNo:s+m; Value: v;

  6. Process all messages in the Q, in sequenceNo order,

    until the just queued message is top of the Q.

  7. Enqueue update (Assignment) message other thread.

Now look at (one of) the different sequences in which these action can occur on two sequentially run, pre-emptive threads. At the start of the following assume that both threads copies of the DB are synchronised and contain one selector X with a value of 2. The action to be performed by both threads is to increment the shared variable X by 1.

Thread 1 Thread2 X1=2 X2=2 ================================================================= Step A -> "S:X:1:1" Step B -> 1 message (ours) to process. Step C -> return 2 ------------------------------------------- Timeslice Step A -> "S:X:2:4" Step B -> 1 msg to process -> "S:X:2:4" Step C -> return 2 ------------------------------------------- Timeslice Step D -> 2+1 = 3 Step E -> "A:X:1:7:3" Step F -> X1=3 -> Enqueue "A:X:1:9:3" -> T2 ------------------------------------------- Timeslice Step D -> 2+1=3 Step E -> "A:X:2:11:3" Step F -> 2 msgs to process -> "A:X:1:9:3" -> X2=3 -> "A:X:2:11:3" -> X2=3 !!!Bang!!! =================================================================

At point !!!Bang!!!, both threads think they have incremented X, but both threads have a value for 3 for X?

Maybe I'm being thick tonight (always?), but I don't read anything explicit or implied in the RFC that deals with this overlap?

Even if I am missing something, and the RFC does deal with this (which I think it must but I don't see how?), then if every simple increment or decrement of a shared value is going to require the (minimum) 7 steps I've outlined above (skiping over that:

  • each transmission to another thread requires positive confirmation;
  • the number of transmission/wait for receipt cycles grows with each additional sharing thread;
  • that your statement that "Of course all data referenced by data in the "Database" must be replicated themselves and stored in the "Database"." means that every value in every thread must be shared between every thread (including all code) in order to deal with closures; globals; lvalues etc.
)

if this was implemented, I think that the phrase that comes to mind to describe it is "slow as molasses".

I know I missing something vital here--but what?


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

Replies are listed 'Best First'.
Re^8: Passing globs between threads (Updated).
by Anonymous Monk on Oct 08, 2004 at 17:19 UTC
    What you miss here is: We are talking yet only about sharing Storable objects between threads. Closures and lvalues are NOT Storable. And your problem with incrementing shared values isn't new,it's in every threading library. That's why mutexes exist. A mutex in this application could be simply a message "Sequence nrs from A upto B are owned by Thread C". If this message "modifies" key A||B and keys A...B are "modified" with the value A||B (all at seqnr A) then thread C could after querying with seqnr B key A...B be sure it could transact the whole increment using seqnrs A...B. The query with sequence number should result in the value this key had at the time of this sequence number or an error if impossible to say (meaning the lock failed).

    BTW a mutex is not Storable.

    And no: Only objects referenced FROM the Database must be stored, not anythig that REFERENCES objects in the Database. That restriction applys to Storable too, so it's not outrageous.

      That's why mutexes exist.

      From an earlier AM post:

      Because no replica is ever modified or seen by another thread as its own, no locking is necessary.

      I accept that with the addition of a concept of ownership (implemented through "take ownership" & "release ownership" messages; or several otherways) that the meshanisms described in the RFC could become the basis for a inter-thread, data-sharing mechanism.

      It's not to dissimilar from the message passing semantics of Win32/OS-2/other GUIs, or even those of SmallTalk.

      However, you'll (I hope) accept that this is:

      1. pushing what is described in the RFC way beyond where that document leaves off.
        And your problem with incrementing shared values isn't new,it's in every threading library.

        It's not just incrementing an integer. Any form of modification to an existing value suffers the same problem of requiring synchronisation and locking. Appending to ...; upper/lower-casing ...; truncating a string. Indeed anything that you might do with s/// rather than m// woulld require this. Along with any math; anything that modified the state of a object; and so on.

      2. the overhead involved in the process would render the implementation unusable for anything other than an academic exercise.

      Perhaps my mistake was to take the idea that you (or some Anonymous Monk?*) seriously.

      * With the first quote in my sig, it would be crass to critique you for corresponding anonymously, but it does make life a life a little more tolorable to know you are continuing a conversation with a single individual, and not just another passer-by making comment on the basis of one post, rather than the complete thread. Maybe we should have a set of AM1..N anonymonk ID's that could be "occupied" for the duration of single conversations--but then how would you indicate that you are the same person without some form of verification? Oh well! Continuing, assuming you are the same AM.

      It all comes back to your(?) response to my assertion:

      If it was trivial, or even if it was possible with a reasonable degree of difficulty to coordinate and synchronise objects across threads, Perl would do that for you. The fact that the clever guys that got iThreads this far did not step up to the plate and do this already, almost certainly means that you should not be trying to do this either.

      Namely:

      Yes it's easy to synchronize objects (at least Storable ones) across threads. RFC677 (yes the one from IETF) tells you how.

      Once you accept the premise that to do anything useful with shared objects across threads, you need to be able to share and modify the internal state of that object from all sharing threads, you need to implement locking somehow--as opposed to what the OP of this thread later admitted he was doing which is to use Storable to pass a pre-initialised object to a thread for instantiation within that thread; and for it's exclusive use--you can then look at how you might easily implement that.

      Assuming that you got past that last overcomplicated sentance/paragraph in agreement, then I would suggest that in a threaded environment, where the basis merit is that you have shared access to memory, that duplicating data is entirely the wrong way to go about it. Once you employ the message passing semantics to all accesses to the internals of an object, there presents itself a much cleaner, less memory intensive solution to the problem.

      Rather than duplicating all the data(attributes) and all the methods to every thread sharing access; you maintain a single copy of all shared objects (data and methods) in a single thread's memory space, and an object reference becomes a handle that identifies:

      1. the owning thread;
      2. the object's class;
      3. the instance;

      Any operation (method call) on a shared object then gets translated into a message enqueued to the owning thread identifying the object(instance); the method called; and the parameters. The "calling thread" blocks until the sharing thread processes the message and returns the result.

      The advantages of this mechanism are:

      • Operations not involving shared objects do not require any form of synchronisation, locking or even tests for either condition.

        These non-shared operations continue at full speed.

      • All shared object operations are automatically synchronised through the sharing thread's queue.
      • All shared object operations automatically block when required, for exactly as long as is required.

        The key to this statement is that "sending the message" to the sharing thread, would in fact suspend the sending thread, and co-operatively transfer it's timeslice(s) to the sharing thread. And the key to that is a little known Win32 mechanism called Fibres.

        For some stupid reason, the only MSDN/Fibres link that google will throw up right now is a French language page. I'll try to update with the English version once Google/MSDN is being more cooperative.

        Now I doubt if the concept is unique to win32, but I don't actually know of another implementation, so I can't discuss it in generic terms.

        The basic description is this. You create a pair (or more) threads in the usual way, but then you "convert" them into a fibres. Once this is done, the two fibres share a single thread's timeslices' cooperatively, within the auspices of the overall pre-emptive multi-tasking environment. They, in effect, become timesliced co-routines.

        The full implications and advantages of this arrangement aren't entirely obvious when you first read the descriptions of them, but with thought, the possibilities are quite fascinating.

        Many languages feel the need to implement their own (usually cooperative) threading internally (Java etc.), in order to make sharing objects across threads feasible. The disadvantage of this is that each "thread" within a process shares a single OS timeslice--because the Process is doing the timeslicing. This is especially disadvantageous in a multi-processor SMP-type environment as it precludes making full (any?) use of multiple cpu's within a single process. Fibres do not suffer this restriction as they are first-class OS level objects, not language-based, per-process objects.

      My thoughts on this are not fully formed yet. My progress on implementing the ideas I'm trying to describe here (which first came to me about 9 months ago--and as far as my searches for prior art have gone, are unique?) is slow.

      To implement the ideas requires using a language that allows low-level access to the machine.

      C would work, but then you get

      • a dependancy on the runtime libraries.

        POSIX-complient libraries, then nearest thing to cross platform available simply don't cover the ground that I'm experimenting in;

      • are often non-reentrant (a pre-requisite);
      • forces me to have to re-invent OO technologies;

        Objects, vtables, references, memory management, exception handling, etc.

      C++ is possible, but has

      • horrible syntax;
      • still doesn't handle generics (templates) properly;
      • most of the useful features are in the STL; these vary by compiler/platform, and are to heavy and complicated.
      • Most importantly, gives my brain an a***ache every time I use it.

      Java.

      • Too verbose;
      • libraries too heavy;
      • doesn't give me enough access to the bare metal;
      • has a permission mechanism, that whilst powerful, doesn't fit with the native permissions mechanisms on any platform.

      Perl. Obviously not!

      There are two choices: (macro) assembler. Hard work. cpu if not OS specific.

      My choice: D. Has (or will have) everything I want and nothing I don't. Already cross-platform after a fashion, and getting stronger. The downside is that it's still going through birthing pains.

      One day I'll have something to show for my thoughts, efforts and research that will justify some of my expressed opinions. Till then, all I can do is express them and consider each alternative and counter argument in the light of what I already (think) I know.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
        To set you at ease: All AM-Posts in our thread are from me.

        I'm thinking you are hypocritcal. First you complain about noncoordinated replicas. I show you then that replicas can be so coordinated, that they are EXACTLY like a shared instance, including all problems. Now you complain about these problems because you would like a middle ground. Then I show how such middle ground can be reached with only one extra message. Now it's the complexity. Fear not! This is possible too by distributed transactions. Think about this: A key points to a value, which as key points to the next value, and so on until pointing to the first key. Linked List as Ring. Nothing new. Who closes the Ring (commit)? If one of these values does not participate (points to itself) can the Ring be closed (abort)? No value decided=lock? And that is only with get/set!

        The essential Good Thing is It's tunable!!

        BTW I know about Fibers. The complexity of them is hidden in the OS. For instance: If a Fiber is preempted by another process the rescheduling must not choose its partner.

        Here I'm again: The taking of ownership is for a FEW SEQUENCE NUMBERS only. There should never be contention about this ressource (You can take the next). So it will be faster than locking. No thread will EVER be locked. The only wait can be a Select to another thread (where the answer isn't already replicated).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://395824]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-23 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found