Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^10: Thread terminating abnormally COND_SIGNAL(6)

by rmahin (Scribe)
on Jul 18, 2013 at 17:51 UTC ( #1045141=note: print w/replies, xml ) Need Help??

in reply to Re^9: Thread terminating abnormally COND_SIGNAL(6)
in thread Thread terminating abnormally COND_SIGNAL(6)

Seriously awesome insights, and you've touched on one of the design decisions we made a while back.

Jobs get queued to resource specific queues.

We discussed this at great length, and decided on our current approach for one main reason (that i dont think i really mentioned before): When a job is first issued, it does not necessarily have to be set for a specific resource, but it could have a list of resource that are an adequate match. This being the case, we could still definitely make it work, but just werent entirely happy with some of the caveats. We saw two main approaches to tackling this:

  1. Put the job in all of the individual queues that match and find some way to prevent it from getting run twice (which seemed like it would take a fair bit of work)
  2. Or put it in the shortest queue. This however is making the assumption that the shortest queue would lead to faster execution, but depending on the job, this may not be the case

With those points in mind, do you still think multiple resource specific queues would be the best approach? Legitimately curious. I do however see your point about there being a bottlneck in our queue, but with the current implementation, I dont really think its quite as bad as you say.

If you have a lot of jobs queued for popular (or slow; or both) resources at the head of your queue; you are going to be constantly polling over those pending jobs and querying the DB for the resource status;

Because again we only query the database one time per pass over the array, not each individual element. After we do that, and build our hash of the available resources, wouldn't checking the jobs be relatively quick? I'm not saying this trying to cling to my current implementation, but I just dont see how querying the database once, building a hash, and checking each node in the single queue would be drastically slower than dequeueing constantly, and checking the database every time.

(This is just me thinking how I would avoid your undefined jobnode issue. If you're happy with your current solution, stick with it.:)

Yeah, i appreciate the advice, I wouldnt say im thrilled with it, but I think it would take a decent bit of time to rework this aspect, and at the moment I'm just not seeing the benefits justifying it. (Although the code would be much prettier :( )

You had me at:

So don't queue nodes; queue node ids.

Genius. Again, ashamed I did not think of this. Shouldn't be terribly difficult to implement either.

This next bit is speculative and would require careful thought and analysis; but if your architecture lends itself, you may even avoid the need to do locking on %nodes.

Definitely an intriguing idea. I'll keep this in mind, but think i'll have to get stuff working without before really investigating this aspect :P

I can almost feel your frustration as you read this and are thinking: "But I'd have to re-write the whole damn thing"! If it doesn't work for you, just don't do it! You don't even have to tell me :)

Quite the contrary, I am a fresh CS graduate and do not work with any programming (let alone perl) experts. I'm happy for the suggestions and advice. Only way to learn.

  • Comment on Re^10: Thread terminating abnormally COND_SIGNAL(6)

Replies are listed 'Best First'.
Re^11: Thread terminating abnormally COND_SIGNAL(6)
by BrowserUk (Pope) on Jul 18, 2013 at 20:04 UTC
    With those points in mind, do you still think multiple resource specific queues would be the best approach?

    Yes. I'd tackle the 'which resource queue for non-specific jobs' using your option 1. Queue it (the jobid) to all applicable. And to deal with the run twice problem I have a status field in the object that the first resource worker that dqs it, changes from 'pending' to 'running'. Any subsequent worker dqing it sees the running and simply discards it and grabs the next one.

    Multiple tokens means locking is required; and the workers need to lock the status before reading it to prevent two resources seeing pending simultaneously.

    But if the node hashes (within %nodes) are also shared, then you only need lock the specified node, not the entire %nodes, so the lock time will be brief and contention low.

    wouldn't checking the jobs be relatively quick? ... I just dont see how ... than dequeueing constantly

    The problem is not (necessarily or only) that it would be slower. It is that busy loops sap cpu constantly redoing the same work over and over until something changes to allow it to move forward. (You've already seen this affect yourself as identified by your comment in your cut-down code:

    #sleep 1; #slow it down, not useful for testing though

    With my alternative, the central dispatcher thread sits blocked (consuming no cpu at all) in a dq, until a new job arrives, and wakes instantly (subject to getting a timeslice, usually a few microseconds) when something does. And then all it has to do is reQ the token to the appropriate resource Q and goes back to blocking.

    If the resource is available, it wakes up instantly to process it. And if not it will grab it as soon as its done with the preceding jobs.

    Conversely, with your single queue-cum-array affair, when a resource comes free it has to wait for you to notice (query the DB), build your hash; and then scan through all the pending jobs (sod's law says it will always be the last one), and then reQ it.

    If one of your popular resources gets a long job (or hangs or crashes) and you get a build up of a backlog for that resource, then jobs for every other resource even idle ones, have to wait while you scan over that backlog every time.

    Think of it like this (sad analogy time:). Would you rather have a receptionist in your local hospital that:

    • just asked which dept. and directed you to their waiting room;
    • or one that made you stand in a line at her desk with everyone else for every dept., whilst she rang around those departments asking if they were free for a client, and then walks down the line asking each person in turn if they needed one of the departments that said yes?

    Blocking is always better than polling. Its why we have whistles on kettles and bells on phones and front doors. If you had to keep picking up the receiver to see if anyone was calling, phones would never have caught on :)

    Definitely an intriguing idea. I'll keep this in mind, ...

    If you went for the reQ to multiple Qs idea above, the no locking idea is a non-starter.

    A bit more grist for your mill :) Good lock.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Argh, alright you have finally convinced me! I can no longer in good conscience go forward with the current approach. Thanks again for your thorough explanations, and now I must get to rewriting the whole damn thing!. Until next time!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045141]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2018-09-22 09:35 GMT
Find Nodes?
    Voting Booth?
    Eventually, "covfefe" will come to mean:

    Results (190 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!