Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^3: Thread terminating abnormally COND_SIGNAL(6)

by BrowserUk (Patriarch)
on Jul 16, 2013 at 00:32 UTC ( [id://1044490]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Thread terminating abnormally COND_SIGNAL(6)
in thread Thread terminating abnormally COND_SIGNAL(6)

Are there any typical cases that could cause the semaphore to become invalid? Like too many Thread::Queue's?

I'm not sure. But, internally, a Windows system call WaitForMultipleObjects is used in various places, and this has a limit. Historically that was 64 handles though it might have changed on later versions. Note: You can have (many) more waitable handles, you can only wait on up to 64 of them at a time without using additional techniques. NOTE: This was just a guess on my behalf having looked at your linked code.

I create 2 Thread::Queue's per JobNode, and in my testing i was creating roughly 10 JobNodes every 2 minutes. So that would be about 300 every hour? These queues though are basically used only once, in that after the JobQueue enqueues something, the JobNode no longer cares about that Thread::Queue. Is there something I should be doing to actually clean these up somehow?

Firstly, using a Queue to pass a single value is a nonsense.

  1. Do you not know you can pass arguments to thread when you create them?
    my $thread = async( sub{ print "@_"; }, 123, 'fred', {'a'..'f'}, [ 0.. +9 ] )->join;; 123 fred HASH(0x3ea93f8) ARRAY(0x3ea9470)
  2. Have you heard of threads::shared?

    If you need to pass a single value to a thread after it has been created rather than when you create it, then use a shared scalar:

    my $jobNo :shared = 0; ... sub job { sleep 1 until do{ lock $jobNo; $jobNo; }; ... use $jobNo. ... } ... my $thread = threads::create( \&job ); ... { lock $jobNo; $jobNo = getJobNo(); }

    Of course, you'll want a different jobno of each thread, so use a shared hash:

    my %jobNos :shared; ... sub job { my $tid = threads->tid; my $jobNo; sleep 1 until do{ lock %jobNos; $jobNo = $jobNos{ $tid } }; ## now use $jobNo. ... } my $thread = threads->create( \&job ); ... some time later { lock %jobNos; $jobNos{ $thread->tid } = getJobNo() }; ...

    There are other (some would say better) ways of waiting for a shared variable -- cond_vars -- than busy looping over sleep, but this is easy to write, explain and -- most importantly -- debug.

Secondly, yes, you should be cleaning up those queues. Each queue encapsulated various system resources -- including those semaphores -- and the are a finite resource. 300/hour for 24 hours means 7,200 semaphores. I can't tell without deep inspection of your code, but you could simply be running out of resources. I'd expect to get a different error message than you have -- something like: Insufficient system resources exist to complete the requested service when the queue (or a resource it uses) was being created, but it is possible that an error return is not being checked at that point.

Remember also that for a Queue to be cleaned up, *all references* at both ends will need to be freed completely before the reference count will drop to 0 and it will get recycled.

This could be a bug in threads or Threads:Queue, or perl's internals, but having glanced briefly at your linked code, I suspect that it is much more likely that the problem is sourced in the way you are abusing those modules.

In essence, I think you are constructing a very complicated system around the use of threads and queues, but you do not really know enough about those modules to be doing so. I'd strongly advise that you create a few simple, stand-alone programs and play with threads, Thread::Queue (and threads::shared, and acclimatise yourself to them before using them within what appears to be a very complex library module -- presumably intended to be used by others.

I hope that does not sounds patronising -- it certainly isn't intended to. I just know from deep experience that Perl's threading is quite different to other forms of threading and it takes everyone coming to them -- regardless of their threading background in other languages -- a while to become familiar with their particular strengths and weaknesses.

Often at this point, I offer to review the threaded code (here or via email), but given the presence of "IBM::CLIFARM::SERVER" in the title of your module, I doubt there would be any point. I don;t have a server farm lying around -- IBM or otherwise :) And from looking at the bits you linked, this isn't something that could be debugged 'by inspection' (without running it).

To reiterate:

  • I strongly suspect you are running your system out of some critical resource.

    You might be able to verify this using the System Information panel of ProcessExplorer.exe and checking the "Totals->Handles" count whilst your code is running. If it keeps rising and rising -- and drops back significantly when you kill your process ....

  • I think that -- on the basis of the little I've seen -- that your code will need a substantial re-work to make it viable.

    Your comments already show you are uncomfortable with using queues the way you are.

    Doing so is simply wrong, and almost certainly completely unnecessary.

    You just need to become familiar with the facilities and techniques available to you, at which point you'll see a better way to tackle the problem.

  • There is not much I can do to help you with a project of this size and complexity.

    I strongly advise that you try out the main components of your project is small, stand-alone throw-aways until you are convinced they work the way you want them to.

    These will not only let you become familiar with the way Perl's threading works; but when you get problems, you have a ready-made test case you can post here for us to help you with.

    I write all my projects this way -- small stand-alones to iron out the details of the algorithms (and my understanding) -- and then I design the main project in the light and knowledge of what I've learned. I strongly advocate the method to you (and anyone listening).


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
..

Replies are listed 'Best First'.
Re^4: Thread terminating abnormally COND_SIGNAL(6)
by rmahin (Scribe) on Jul 16, 2013 at 01:37 UTC

    Hey Browser

    Thanks for taking the time to look at the code, I always find your replies informative and helpful, despite going off of the small portion of the code i can provide, and the real inability to let you run the code for yourself.

    One of the reasons it was done this way was because we were trying to keep the blocking mechanism done within the JobNode object, and as we were passing the JobNode into a ThreadQueue, it was not allowed to have any shared variables. I couldn't find another way to accomplish that (although it appears it wasnt really accomplished anyways).

    I will work on this taking your advice into consideration, and report back once I have a smaller mock up, or if I have any more questions.

    Thanks again, I appreciate the bluntness and straightforwardness of your replies.

Re^4: Thread terminating abnormally COND_SIGNAL(6)
by rmahin (Scribe) on Jul 16, 2013 at 22:56 UTC

    Ok so i have written a smaller version the pretty much captures what happens in my real program, and I have done it using a shared hash as you recommended. The program simulates how we have one threadpool for client connections and running work in directly in them, and another thread pool for background work. I think it works well, but am curious if I am doing anything improperly, or if anything could be improved upon.

    I did have to add this at line 205, and am not sure why. I dont know if there is something wrong in the code or not, but it doesnt seem like it should be necessary.

    #next if(!defined($node));

    Anyways, here is my code.

      I did have to add this at line 205, and am not sure why. I dont know if there is something wrong in the code or not, but it doesnt seem like it should be necessary. #next if(!defined($node));

      It'll take me a while to digest the entirity of your code, but I think I can answer this one straight away.

      The short answer is that you've fallen into the trap of using the so-called "Advanced Methods" of Thread::Queue that were added by that module's newest owner.

      IMO these methods: peek(), insert() & extract() should never have been added to a Queue module as they break all the basic invariants (and thus expectations) of Queues. They effectively turn the Queue into an (shared)Array. Which is a nonsense because the underlying data structure is an array, and all this does is make it a very expensive to maintain array.

      And your (perfectly understandable given the modules provision of these methods) usage of the module as an array is exactly what is giving you the problem.

      Simplified, you are

      1. Querying the size of the array: (my $amtInQ = $queue->pending();).
      2. Then looping over the array by index: (for(my $i = 0; $i < $amtInQ; $i++){).
      3. Then peeking at array(i): (my $node = $queue->peek($i);).
      4. Then either:
        1. Doing nothing with it: (if(int($node->{canBeSet}) == 0){).
        2. Or: doing something with it then removing it from the array: ($queue->extract($i);).
      5. then looping back to process the next index.

      But ... by extracting (splicing) an element from the array, there are now less items in it, than there were when you queried it (my $amtInQ = $queue->pending();), and so when you get towards the end of your loop, there are no ith items left to peek(). It is the very fact that you are using array semantics on the queue that creates the problem.

      This is how I would code that same loop:

      while( my $node = $Qjob->dequeue ) { if( $node->{canBeSet} ) { ## deal with this node } else { $node->{canBeSet} = 1; $Q->enqueue( $node ); ## push back for next time. } }

      Queue semantics and no busy loops, nor any need to sleep to avoid burning cpu.

      Now, one objection you might have to that is your ThreadDone check and processing:

      sub killThread{ my $self = shift; { lock $self->{threadDone}; $self->{threadDone} = 1; } return $self->{thread}; } sub manageQueue{ my $self = shift; while(1){ my $done; { lock $self->{threadDone}; $done = $self->{threadDone}; } my $amtInQ = $queue->pending(); last if $done && $amtInQ == 0; ...

      But, if you used my queue processing loop above, your KillThread() method simply becomes:

      sub killThread{ $queue->enqueue( undef ); }

      When the undef is dequeued, the while loop ends and the thread self-terminates.

      There's a slight wrinkle with that. If the undef has been queued and then you encounter a node with CanBeSet = 0; then my code would requeue that node after the undef and the loop will terminate before it gets reprocessed, which you seem to explicitly not want to do. So, I would then recast my version of the loop like this:

      while( 1 ) { my $node = $queue->dequeue; if( !defined $node and $queue->pending ){ $queue->enqueue( undef ); next; } else { last; } if( $node->{canBeSet} ) { ## deal with this node } else { $node->{canBeSet} = 1; $Q->enqueue( $node ); ## push back for next time. } }

      Again, it still retains the Queue semantics avoiding the busy loop; but ensures the queue gets cleared before terminating.

      However ... I suspect that your entire design can be significantly further simplified but I'll need to think on that further and I'll save it for another post.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        But ... by extracting (splicing) an element from the array, there are now less items in it, than there were when you queried it (my $amtInQ = $queue->pending();), and so when you get towards the end of your loop, there are no ith items left to peek(). It is the very fact that you are using array semantics on the queue that creates the problem.

        Ah cant believe i missed that one. So obvious, thank you.

      I suspect this has something to do with your simplifications, but for the life of me I cannot see what the purpose of the JobQueue is?

      1. Jobs/Nodes get queued onto it by the first set of threads.
      2. Another thread monitors it, pulls them off, sets a random value into them and then sets them into a shared hash in the main thread;
      3. But then you do nothing with that?

      You seem to be getting the "results" in the main thread via the results queue; so what is the jobs queue/nodes hash for?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Ok I'll try to explain a bit further, and i apologize for the confusion, the random variable was just to test having multiple values being put into the "return value" hash. In this demo, it does indeed do absolutely nothing.

        In my main program, clients connected, and threadPool1 handles their responses. They issue commands, and can either specify those commands to be run in their same thread so they can see the output in the interactive shell, or run in the background (and get pushed to threadPool2).

        So the jobQueue. When users issue commands, their command can not always be run right away depending on their criteria and what machine in our farm they want to run it on. The queue maintains the order commands were executed in by using the JobNodes which contain information from the command they entered. The jobQueue is not REALLY acting as a typical queue, but more as a utility to block the thread that issued the command until it is it's turn (using subroutines like enqueueJob() that block until a value is set). The first case it blocks is setting the job number. After the job queue returns the job number as shown in my demo, the client thread will continue. The jobQueue will pause the command again to tell it actually what resource it should be using.

        The purpose of the JobQueue is we want to preserve the order the commands are issued in regardless of if the database is updated before the queue gets back to a job that was deferred. For example

        1. User1 issues 2 commands for machine M1
        2. User2 issues 2 commands for M1, and M1 can only have 3 commands running
        3. User3 issues 2 commands for M1

        Our queue now has (user2, user3, user3)

        The current approach, using the Thread::Queue as an array, allows us to issue one DB query at the beginning of each iteration for the state of all our resources, so we can reference that instead of querying the database for every job in the queue every time we check it. So our process is:

        1. Query database for current state and build hash
        2. check nodes
        3. repeat steps above starting at beginning of array

        This allows us to preserve the order because: if we dequeue/enqueue user2's, and one of user1's finishes and updates our database, then when we dequeue user3's command, it will see that M1 has an available resource, and run that job. That is what we want to avoid. Using it as a proper queue would not preserve the order in that fringe case without some more tinkering.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1044490]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-20 00:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found