Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^6: Thread terminating abnormally COND_SIGNAL(6)

by rmahin (Beadle)
on Jul 17, 2013 at 19:14 UTC ( #1044876=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Thread terminating abnormally COND_SIGNAL(6)
in thread Thread terminating abnormally COND_SIGNAL(6)

Ok I'll try to explain a bit further, and i apologize for the confusion, the random variable was just to test having multiple values being put into the "return value" hash. In this demo, it does indeed do absolutely nothing.

In my main program, clients connected, and threadPool1 handles their responses. They issue commands, and can either specify those commands to be run in their same thread so they can see the output in the interactive shell, or run in the background (and get pushed to threadPool2).

So the jobQueue. When users issue commands, their command can not always be run right away depending on their criteria and what machine in our farm they want to run it on. The queue maintains the order commands were executed in by using the JobNodes which contain information from the command they entered. The jobQueue is not REALLY acting as a typical queue, but more as a utility to block the thread that issued the command until it is it's turn (using subroutines like enqueueJob() that block until a value is set). The first case it blocks is setting the job number. After the job queue returns the job number as shown in my demo, the client thread will continue. The jobQueue will pause the command again to tell it actually what resource it should be using.

The purpose of the JobQueue is we want to preserve the order the commands are issued in regardless of if the database is updated before the queue gets back to a job that was deferred. For example

  1. User1 issues 2 commands for machine M1
  2. User2 issues 2 commands for M1, and M1 can only have 3 commands running
  3. User3 issues 2 commands for M1

Our queue now has (user2, user3, user3)

The current approach, using the Thread::Queue as an array, allows us to issue one DB query at the beginning of each iteration for the state of all our resources, so we can reference that instead of querying the database for every job in the queue every time we check it. So our process is:

  1. Query database for current state and build hash
  2. check nodes
  3. repeat steps above starting at beginning of array

This allows us to preserve the order because: if we dequeue/enqueue user2's, and one of user1's finishes and updates our database, then when we dequeue user3's command, it will see that M1 has an available resource, and run that job. That is what we want to avoid. Using it as a proper queue would not preserve the order in that fringe case without some more tinkering.


Comment on Re^6: Thread terminating abnormally COND_SIGNAL(6)
Re^7: Thread terminating abnormally COND_SIGNAL(6)
by BrowserUk (Pope) on Jul 18, 2013 at 00:52 UTC
    The purpose of the JobQueue is we want to preserve the order the commands are issued in ... This allows us to preserve the order because: if we dequeue/enqueue user2's, and one of user1's finishes and updates our database, then when we dequeue user3's command, it will see that M1 has an available resource, and run that job. That is what we want to avoid. Using it as a proper queue would not preserve the order in that fringe case without some more tinkering.

    Hm. But, queues DO preserve order. That's kind of their raison d'Ítre. (And they also 'close up the gaps' automatically!)

    The problem -- I would suggest -- is that you are storing multiple requests as single items.

    If instead, (in your scenario above), you queued two items for each of your 3 users; you can now process that queue, queue-wise; re-queuing anything that isn't yet ready and discarding (moving elsewhere) anything that is complete and the queue will tale care of keeping things in their right order and ensuring that the 'wholes' get closed up, all without you having to mess around with indices trying to remember what's been removed and what not.

    Just a thought.


    That said; I'm still not clear on why you need the shared %nodes hash, when the results from the jobs are returned to the main thread via the $Qresults?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I really apologize, i dont think I'm being completely clear.

      Hm. But, queues DO preserve order. That's kind of their raison d'Ítre. (And they also 'close up the gaps' automatically!)

      Generally speaking, yes they do. However in our case I really think there would be potential for things to be executed out of order using a standard queue, but not because of any fault in the queueing mechanism. Because using a typical queue, would mean we query the database before processing each node to see if resources are available-- this would immediately free up any resources and allow jobs to execute even if there were preceeding nodes that were refused and requeued. Do you follow?

      If instead, (in your scenario above), you queued two items for each of your 3 users;

      Again I apologize, they are being treated as individual items. So take the example again.

      1. User1 issues 2 commands for machine M1
      2. User2 issues 2 commands for M1, and M1 can only have 3 commands running
      3. User3 issues 2 commands for M1

      Using an actual queue the process would be the following

      1. Query DB, see 3 resources available, Execute user1's 1st command, decrement sessions in db
      2. Query DB, see 2 resources available, Execute user1's 2nd command, decrement sessions in db
      3. Query DB, see 1 resources available, Execute user2's 1st command, decrement sessions in db
      4. Query DB, see 0 resources available, requeue user2's 2nd command
      5. Now lets say the 1st command finishes, and updates the database, incrementing its session counter
      6. Query DB, see 1 resources available, Execute user3's 1st command

      Now obviously, there would be ways to avoid that issue, if for instance the jobs were not incrementing their session counter off in other threads somewhere, but this would just require a rather larger to change to existing framework.


      That said; I'm still not clear on why you need the shared %nodes hash, when the results from the jobs are returned to the main thread via the $Qresults?

      In the actual program results are not returned to the main thread. The main thread just listens for client connection and passes the file descriptor to client threads. Results are sent back to the user, or discarded if the job was executed in the background. The %nodes hash was just to pass information back to the jobnode in the originating thread. My understanding was that when you enqueue something into the Thread::Queue, you get shared_clone of that object and thus cannot make direct modifications to the object. The hash is simply the means for returning information to the jobNode in the originating thread. Hence the

      foreach my $key(keys(%$results)){ $node->{$key} = $results->{$key}; }

      The process is:

      1. Main thread accepts connections, passes them to client thread
      2. Client thread accepts command, creates a job node, enqueues it into the job queue (in another thread), waits until the job queue sets a job number for the node..and then some more stuff

      If there is another way to get the job number from the job queue I am all ears. I thought the shared hash was a good implementation for what i needed but yeah if theres a better way to do it, happy to listen and try it out.

      Thanks again for your responses!

        Now lets say the 1st command finishes, and updates the database, incrementing its session counter

        (This is just me thinking how I would avoid your undefined jobnode issue. If you're happy with your current solution, stick with it.:)

        Your current mechanism uses a single queue for all resources; and has to constantly poll each of the pending jobs and then check the required resource to see if it is available. This puts a very expensive 'busy loop' at the heart of your scheduler.

        If you have a lot of jobs queued for popular (or slow; or both) resources at the head of your queue; you are going to be constantly polling over those pending jobs and querying the DB for the resource status; in order to get to the newer pending requests for less popular/higher throughput resources. In other words, the slower, more popular resources become a bottleneck in front of all your other faster, more transient ones.

        That smacks of a scaling issue being designed into the very heart of your processing.

        I would tackle the queuing for resources in a quite different way:

        • Jobs get queued to resource specific queues.

          Implementation: A hash of queues keyed by resource name or id..

        • When a job is enqueued, it is inspected and is added to the appropriate queue.
        • You have another 'jobDone' queue. When a job is completed, its jobid/resourceid pair is queued to this queue.

          And the heart of your scheduler is a thread reading (dequeuing, not polling), that jobDone queue.

          As it pulls each jobid/resourceid off that queue, it:

          1. Updates the job status in %nodes.
          2. Updates the DB resource status (if still necessary).
          3. Takes the next pending job off the queue associated with the now freed resource and puts it on the work queue.
          4. Goes back to dequeue the next done job; blocking if there's nothing there.

        My understanding was that when you enqueue something into the Thread::Queue, you get shared_clone of that object and thus cannot make direct modifications to the object. The hash is simply the means for returning information to the jobNode in the originating thread.

        That is true, and I can see how this is affecting your design decisions. But not for the good.

        The fact that when you send an object (node) via a queue means you get a copy means that you now require a second queue to send the (modified) node back to somewhere else so that it can read the modified copies information and update the original. This gets complicated and expensive. And is completely unnecessary!

        You have your shared %nodes Every thread can access that structure directly. So don't queue nodes; queue node ids.

        When the receiver dequeues, instead of getting an unshared clone of the node, it just gets an ID string, that it uses to directly access the shared %nodes, to read information and update directly.

        Now you have no need for the return queue (nor anything to read it!). All your queue messages become lighter and faster; and there is one central, definitive, always up-to-date copy of the complex, structured information.

        This next bit is speculative and would require careful thought and analysis; but if your architecture lends itself, you may even avoid the need to do locking on %nodes.

        If you can arrange that each jobid/resourceid pair token, (a string), is (can only) be created once, then as it gets passed around, only one thread at a time can ever hold it, so there is no need to lock.

        I can almost feel your frustration as you read this and are thinking: "But I'd have to re-write the whole damn thing"! If it doesn't work for you, just don't do it! You don't even have to tell me :)

        But consider the ideas, because done right, you'd have a light, fast, self-synchronising, scalable process with no bottlenecks and no cpu-sapping busy-loops.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        When a job is enqueued, it is inspected and is added to the appropriate queue.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044876]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (14)
As of 2014-09-19 09:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (133 votes), past polls