Re^8: Thread terminating abnormally COND

in reply to Re^7: Thread terminating abnormally COND_SIGNAL(6)
in thread Thread terminating abnormally COND_SIGNAL(6)

I really apologize, i dont think I'm being completely clear.

Hm. But, queues DO preserve order. That's kind of their raison d'être. (And they also 'close up the gaps' automatically!)

Generally speaking, yes they do. However in our case I really think there would be potential for things to be executed out of order using a standard queue, but not because of any fault in the queueing mechanism. Because using a typical queue, would mean we query the database before processing each node to see if resources are available-- this would immediately free up any resources and allow jobs to execute even if there were preceeding nodes that were refused and requeued. Do you follow?

If instead, (in your scenario above), you queued two items for each of your 3 users;

Again I apologize, they are being treated as individual items. So take the example again.

User1 issues 2 commands for machine M1
User2 issues 2 commands for M1, and M1 can only have 3 commands running
User3 issues 2 commands for M1

Using an actual queue the process would be the following

Query DB, see 3 resources available, Execute user1's 1st command, decrement sessions in db
Query DB, see 2 resources available, Execute user1's 2nd command, decrement sessions in db
Query DB, see 1 resources available, Execute user2's 1st command, decrement sessions in db
Query DB, see 0 resources available, requeue user2's 2nd command
Now lets say the 1st command finishes, and updates the database, incrementing its session counter
Query DB, see 1 resources available, Execute user3's 1st command

Now obviously, there would be ways to avoid that issue, if for instance the jobs were not incrementing their session counter off in other threads somewhere, but this would just require a rather larger to change to existing framework.

That said; I'm still not clear on why you need the shared %nodes hash, when the results from the jobs are returned to the main thread via the $Qresults?

In the actual program results are not returned to the main thread. The main thread just listens for client connection and passes the file descriptor to client threads. Results are sent back to the user, or discarded if the job was executed in the background. The %nodes hash was just to pass information back to the jobnode in the originating thread. My understanding was that when you enqueue something into the Thread::Queue, you get shared_clone of that object and thus cannot make direct modifications to the object. The hash is simply the means for returning information to the jobNode in the originating thread. Hence the

foreach my $key(keys(%$results)){
    $node->{$key} = $results->{$key};
}
[download]

The process is:

Main thread accepts connections, passes them to client thread
Client thread accepts command, creates a job node, enqueues it into the job queue (in another thread), waits until the job queue sets a job number for the node..and then some more stuff

If there is another way to get the job number from the job queue I am all ears. I thought the shared hash was a good implementation for what i needed but yeah if theres a better way to do it, happy to listen and try it out.

Thanks again for your responses!

Comment on Re^8: Thread terminating abnormally COND_SIGNAL(6) Download Code

Replies are listed 'Best First'.
Re^9: Thread terminating abnormally COND_SIGNAL(6) by BrowserUk (Patriarch) on Jul 18, 2013 at 03:12 UTC
Now lets say the 1st command finishes, and updates the database, incrementing its session counter (This is just me thinking how I would avoid your undefined jobnode issue. If you're happy with your current solution, stick with it.:) Your current mechanism uses a single queue for all resources; and has to constantly poll each of the pending jobs and then check the required resource to see if it is available. This puts a very expensive 'busy loop' at the heart of your scheduler. If you have a lot of jobs queued for popular (or slow; or both) resources at the head of your queue; you are going to be constantly polling over those pending jobs and querying the DB for the resource status; in order to get to the newer pending requests for less popular/higher throughput resources. In other words, the slower, more popular resources become a bottleneck in front of all your other faster, more transient ones. That smacks of a scaling issue being designed into the very heart of your processing. I would tackle the queuing for resources in a quite different way: Jobs get queued to resource specific queues. Implementation: A hash of queues keyed by resource name or id.. When a job is enqueued, it is inspected and is added to the appropriate queue. You have another 'jobDone' queue. When a job is completed, its jobid/resourceid pair is queued to this queue. And the heart of your scheduler is a thread reading (dequeuing, not polling), that jobDone queue. As it pulls each jobid/resourceid off that queue, it: Updates the job status in `%nodes`. Updates the DB resource status (if still necessary). Takes the next pending job off the queue associated with the now freed resource and puts it on the work queue. Goes back to dequeue the next done job; blocking if there's nothing there. My understanding was that when you enqueue something into the Thread::Queue, you get shared_clone of that object and thus cannot make direct modifications to the object. The hash is simply the means for returning information to the jobNode in the originating thread. That is true, and I can see how this is affecting your design decisions. But not for the good. The fact that when you send an object (node) via a queue means you get a copy means that you now require a second queue to send the (modified) node back to somewhere else so that it can read the modified copies information and update the original. This gets complicated and expensive. And is completely unnecessary! You have your shared `%nodes` Every thread can access that structure directly. So don't queue nodes; queue node ids. When the receiver dequeues, instead of getting an unshared clone of the node, it just gets an ID string, that it uses to directly access the shared `%nodes`, to read information and update directly. Now you have no need for the return queue (nor anything to read it!). All your queue messages become lighter and faster; and there is one central, definitive, always up-to-date copy of the complex, structured information. This next bit is speculative and would require careful thought and analysis; but if your architecture lends itself, you may even avoid the need to do locking on `%nodes`. If you can arrange that each jobid/resourceid pair token, (a string), is (can only) be created once, then as it gets passed around, only one thread at a time can ever hold it, so there is no need to lock. I can almost feel your frustration as you read this and are thinking: "But I'd have to re-write the whole damn thing"! If it doesn't work for you, just don't do it! You don't even have to tell me :) But consider the ideas, because done right, you'd have a light, fast, self-synchronising, scalable process with no bottlenecks and no cpu-sapping busy-loops. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. When a job is enqueued, it is inspected and is added to the appropriate queue.	[reply] [d/l] [select]
Re^10: Thread terminating abnormally COND_SIGNAL(6) by rmahin (Scribe) on Jul 18, 2013 at 17:51 UTC
Seriously awesome insights, and you've touched on one of the design decisions we made a while back. Jobs get queued to resource specific queues. We discussed this at great length, and decided on our current approach for one main reason (that i dont think i really mentioned before): When a job is first issued, it does not necessarily have to be set for a specific resource, but it could have a list of resource that are an adequate match. This being the case, we could still definitely make it work, but just werent entirely happy with some of the caveats. We saw two main approaches to tackling this: Put the job in all of the individual queues that match and find some way to prevent it from getting run twice (which seemed like it would take a fair bit of work) Or put it in the shortest queue. This however is making the assumption that the shortest queue would lead to faster execution, but depending on the job, this may not be the case With those points in mind, do you still think multiple resource specific queues would be the best approach? Legitimately curious. I do however see your point about there being a bottlneck in our queue, but with the current implementation, I dont really think its quite as bad as you say. If you have a lot of jobs queued for popular (or slow; or both) resources at the head of your queue; you are going to be constantly polling over those pending jobs and querying the DB for the resource status; Because again we only query the database one time per pass over the array, not each individual element. After we do that, and build our hash of the available resources, wouldn't checking the jobs be relatively quick? I'm not saying this trying to cling to my current implementation, but I just dont see how querying the database once, building a hash, and checking each node in the single queue would be drastically slower than dequeueing constantly, and checking the database every time. (This is just me thinking how I would avoid your undefined jobnode issue. If you're happy with your current solution, stick with it.:) Yeah, i appreciate the advice, I wouldnt say im thrilled with it, but I think it would take a decent bit of time to rework this aspect, and at the moment I'm just not seeing the benefits justifying it. (Although the code would be much prettier :( ) You had me at: So don't queue nodes; queue node ids. Genius. Again, ashamed I did not think of this. Shouldn't be terribly difficult to implement either. This next bit is speculative and would require careful thought and analysis; but if your architecture lends itself, you may even avoid the need to do locking on %nodes. Definitely an intriguing idea. I'll keep this in mind, but think i'll have to get stuff working without before really investigating this aspect :P I can almost feel your frustration as you read this and are thinking: "But I'd have to re-write the whole damn thing"! If it doesn't work for you, just don't do it! You don't even have to tell me :) Quite the contrary, I am a fresh CS graduate and do not work with any programming (let alone perl) experts. I'm happy for the suggestions and advice. Only way to learn.	[reply]
Re^11: Thread terminating abnormally COND_SIGNAL(6) by BrowserUk (Patriarch) on Jul 18, 2013 at 20:04 UTC
With those points in mind, do you still think multiple resource specific queues would be the best approach? Yes. I'd tackle the 'which resource queue for non-specific jobs' using your option 1. Queue it (the jobid) to all applicable. And to deal with the run twice problem I have a status field in the object that the first resource worker that dqs it, changes from 'pending' to 'running'. Any subsequent worker dqing it sees the running and simply discards it and grabs the next one. Multiple tokens means locking is required; and the workers need to lock the status before reading it to prevent two resources seeing pending simultaneously. But if the node hashes (within `%nodes`) are also shared, then you only need lock the specified node, not the entire `%nodes`, so the lock time will be brief and contention low. wouldn't checking the jobs be relatively quick? ... I just dont see how ... than dequeueing constantly The problem is not (necessarily or only) that it would be slower. It is that busy loops sap cpu constantly redoing the same work over and over until something changes to allow it to move forward. (You've already seen this affect yourself as identified by your comment in your cut-down code: `#sleep 1; #slow it down, not useful for testing though` [download] With my alternative, the central dispatcher thread sits blocked (consuming no cpu at all) in a dq, until a new job arrives, and wakes instantly (subject to getting a timeslice, usually a few microseconds) when something does. And then all it has to do is reQ the token to the appropriate resource Q and goes back to blocking. If the resource is available, it wakes up instantly to process it. And if not it will grab it as soon as its done with the preceding jobs. Conversely, with your single queue-cum-array affair, when a resource comes free it has to wait for you to notice (query the DB), build your hash; and then scan through all the pending jobs (sod's law says it will always be the last one), and then reQ it. If one of your popular resources gets a long job (or hangs or crashes) and you get a build up of a backlog for that resource, then jobs for every other resource even idle ones, have to wait while you scan over that backlog every time. Think of it like this (sad analogy time:). Would you rather have a receptionist in your local hospital that: just asked which dept. and directed you to their waiting room; or one that made you stand in a line at her desk with everyone else for every dept., whilst she rang around those departments asking if they were free for a client, and then walks down the line asking each person in turn if they needed one of the departments that said yes? Blocking is always better than polling. Its why we have whistles on kettles and bells on phones and front doors. If you had to keep picking up the receiver to see if anyone was calling, phones would never have caught on :) Definitely an intriguing idea. I'll keep this in mind, ... If you went for the reQ to multiple Qs idea above, the no locking idea is a non-starter. A bit more grist for your mill :) Good lock. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^12: Thread terminating abnormally COND_SIGNAL(6) by rmahin (Scribe) on Jul 18, 2013 at 22:52 UTC

In Section Seekers of Perl Wisdom