Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Thread terminating abnormally COND_SIGNAL(6)

by rmahin (Beadle)
on Jul 15, 2013 at 23:34 UTC ( #1044480=note: print w/ replies, xml ) Need Help??


in reply to Re: Thread terminating abnormally COND_SIGNAL(6)
in thread Thread terminating abnormally COND_SIGNAL(6)

I create 2 Thread::Queue's per JobNode, and in my testing i was creating roughly 10 JobNodes every 2 minutes. So that would be about 300 every hour? These queues though are basically used only once, in that after the JobQueue enqueues something, the JobNode no longer cares about that Thread::Queue. Is there something I should be doing to actually clean these up somehow?

Are there any typical cases that could cause the semaphore to become invalid? Like too many Thread::Queue's?


Comment on Re^2: Thread terminating abnormally COND_SIGNAL(6)
Re^3: Thread terminating abnormally COND_SIGNAL(6)
by BrowserUk (Pope) on Jul 16, 2013 at 00:32 UTC
    Are there any typical cases that could cause the semaphore to become invalid? Like too many Thread::Queue's?

    I'm not sure. But, internally, a Windows system call WaitForMultipleObjects is used in various places, and this has a limit. Historically that was 64 handles though it might have changed on later versions. Note: You can have (many) more waitable handles, you can only wait on up to 64 of them at a time without using additional techniques. NOTE: This was just a guess on my behalf having looked at your linked code.

    I create 2 Thread::Queue's per JobNode, and in my testing i was creating roughly 10 JobNodes every 2 minutes. So that would be about 300 every hour? These queues though are basically used only once, in that after the JobQueue enqueues something, the JobNode no longer cares about that Thread::Queue. Is there something I should be doing to actually clean these up somehow?

    Firstly, using a Queue to pass a single value is a nonsense.

    1. Do you not know you can pass arguments to thread when you create them?
      my $thread = async( sub{ print "@_"; }, 123, 'fred', {'a'..'f'}, [ 0.. +9 ] )->join;; 123 fred HASH(0x3ea93f8) ARRAY(0x3ea9470)
    2. Have you heard of threads::shared?

      If you need to pass a single value to a thread after it has been created rather than when you create it, then use a shared scalar:

      my $jobNo :shared = 0; ... sub job { sleep 1 until do{ lock $jobNo; $jobNo; }; ... use $jobNo. ... } ... my $thread = threads::create( \&job ); ... { lock $jobNo; $jobNo = getJobNo(); }

      Of course, you'll want a different jobno of each thread, so use a shared hash:

      my %jobNos :shared; ... sub job { my $tid = threads->tid; my $jobNo; sleep 1 until do{ lock %jobNos; $jobNo = $jobNos{ $tid } }; ## now use $jobNo. ... } my $thread = threads->create( \&job ); ... some time later { lock %jobNos; $jobNos{ $thread->tid } = getJobNo() }; ...

      There are other (some would say better) ways of waiting for a shared variable -- cond_vars -- than busy looping over sleep, but this is easy to write, explain and -- most importantly -- debug.

    Secondly, yes, you should be cleaning up those queues. Each queue encapsulated various system resources -- including those semaphores -- and the are a finite resource. 300/hour for 24 hours means 7,200 semaphores. I can't tell without deep inspection of your code, but you could simply be running out of resources. I'd expect to get a different error message than you have -- something like: Insufficient system resources exist to complete the requested service when the queue (or a resource it uses) was being created, but it is possible that an error return is not being checked at that point.

    Remember also that for a Queue to be cleaned up, *all references* at both ends will need to be freed completely before the reference count will drop to 0 and it will get recycled.

    This could be a bug in threads or Threads:Queue, or perl's internals, but having glanced briefly at your linked code, I suspect that it is much more likely that the problem is sourced in the way you are abusing those modules.

    In essence, I think you are constructing a very complicated system around the use of threads and queues, but you do not really know enough about those modules to be doing so. I'd strongly advise that you create a few simple, stand-alone programs and play with threads, Thread::Queue (and threads::shared, and acclimatise yourself to them before using them within what appears to be a very complex library module -- presumably intended to be used by others.

    I hope that does not sounds patronising -- it certainly isn't intended to. I just know from deep experience that Perl's threading is quite different to other forms of threading and it takes everyone coming to them -- regardless of their threading background in other languages -- a while to become familiar with their particular strengths and weaknesses.

    Often at this point, I offer to review the threaded code (here or via email), but given the presence of "IBM::CLIFARM::SERVER" in the title of your module, I doubt there would be any point. I don;t have a server farm lying around -- IBM or otherwise :) And from looking at the bits you linked, this isn't something that could be debugged 'by inspection' (without running it).

    To reiterate:

    • I strongly suspect you are running your system out of some critical resource.

      You might be able to verify this using the System Information panel of ProcessExplorer.exe and checking the "Totals->Handles" count whilst your code is running. If it keeps rising and rising -- and drops back significantly when you kill your process ....

    • I think that -- on the basis of the little I've seen -- that your code will need a substantial re-work to make it viable.

      Your comments already show you are uncomfortable with using queues the way you are.

      Doing so is simply wrong, and almost certainly completely unnecessary.

      You just need to become familiar with the facilities and techniques available to you, at which point you'll see a better way to tackle the problem.

    • There is not much I can do to help you with a project of this size and complexity.

      I strongly advise that you try out the main components of your project is small, stand-alone throw-aways until you are convinced they work the way you want them to.

      These will not only let you become familiar with the way Perl's threading works; but when you get problems, you have a ready-made test case you can post here for us to help you with.

      I write all my projects this way -- small stand-alones to iron out the details of the algorithms (and my understanding) -- and then I design the main project in the light and knowledge of what I've learned. I strongly advocate the method to you (and anyone listening).


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    ..

      Hey Browser

      Thanks for taking the time to look at the code, I always find your replies informative and helpful, despite going off of the small portion of the code i can provide, and the real inability to let you run the code for yourself.

      One of the reasons it was done this way was because we were trying to keep the blocking mechanism done within the JobNode object, and as we were passing the JobNode into a ThreadQueue, it was not allowed to have any shared variables. I couldn't find another way to accomplish that (although it appears it wasnt really accomplished anyways).

      I will work on this taking your advice into consideration, and report back once I have a smaller mock up, or if I have any more questions.

      Thanks again, I appreciate the bluntness and straightforwardness of your replies.

      Ok so i have written a smaller version the pretty much captures what happens in my real program, and I have done it using a shared hash as you recommended. The program simulates how we have one threadpool for client connections and running work in directly in them, and another thread pool for background work. I think it works well, but am curious if I am doing anything improperly, or if anything could be improved upon.

      I did have to add this at line 205, and am not sure why. I dont know if there is something wrong in the code or not, but it doesnt seem like it should be necessary.

      #next if(!defined($node));

      Anyways, here is my code.

        I did have to add this at line 205, and am not sure why. I dont know if there is something wrong in the code or not, but it doesnt seem like it should be necessary. #next if(!defined($node));

        It'll take me a while to digest the entirity of your code, but I think I can answer this one straight away.

        The short answer is that you've fallen into the trap of using the so-called "Advanced Methods" of Thread::Queue that were added by that module's newest owner.

        IMO these methods: peek(), insert() & extract() should never have been added to a Queue module as they break all the basic invariants (and thus expectations) of Queues. They effectively turn the Queue into an (shared)Array. Which is a nonsense because the underlying data structure is an array, and all this does is make it a very expensive to maintain array.

        And your (perfectly understandable given the modules provision of these methods) usage of the module as an array is exactly what is giving you the problem.

        Simplified, you are

        1. Querying the size of the array: (my $amtInQ = $queue->pending();).
        2. Then looping over the array by index: (for(my $i = 0; $i < $amtInQ; $i++){).
        3. Then peeking at array(i): (my $node = $queue->peek($i);).
        4. Then either:
          1. Doing nothing with it: (if(int($node->{canBeSet}) == 0){).
          2. Or: doing something with it then removing it from the array: ($queue->extract($i);).
        5. then looping back to process the next index.

        But ... by extracting (splicing) an element from the array, there are now less items in it, than there were when you queried it (my $amtInQ = $queue->pending();), and so when you get towards the end of your loop, there are no ith items left to peek(). It is the very fact that you are using array semantics on the queue that creates the problem.

        This is how I would code that same loop:

        while( my $node = $Qjob->dequeue ) { if( $node->{canBeSet} ) { ## deal with this node } else { $node->{canBeSet} = 1; $Q->enqueue( $node ); ## push back for next time. } }

        Queue semantics and no busy loops, nor any need to sleep to avoid burning cpu.

        Now, one objection you might have to that is your ThreadDone check and processing:

        But, if you used my queue processing loop above, your KillThread() method simply becomes:

        sub killThread{ $queue->enqueue( undef ); }

        When the undef is dequeued, the while loop ends and the thread self-terminates.

        There's a slight wrinkle with that. If the undef has been queued and then you encounter a node with CanBeSet = 0; then my code would requeue that node after the undef and the loop will terminate before it gets reprocessed, which you seem to explicitly not want to do. So, I would then recast my version of the loop like this:

        while( 1 ) { my $node = $queue->dequeue; if( !defined $node and $queue->pending ){ $queue->enqueue( undef ); next; } else { last; } if( $node->{canBeSet} ) { ## deal with this node } else { $node->{canBeSet} = 1; $Q->enqueue( $node ); ## push back for next time. } }

        Again, it still retains the Queue semantics avoiding the busy loop; but ensures the queue gets cleared before terminating.

        However ... I suspect that your entire design can be significantly further simplified but I'll need to think on that further and I'll save it for another post.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        I suspect this has something to do with your simplifications, but for the life of me I cannot see what the purpose of the JobQueue is?

        1. Jobs/Nodes get queued onto it by the first set of threads.
        2. Another thread monitors it, pulls them off, sets a random value into them and then sets them into a shared hash in the main thread;
        3. But then you do nothing with that?

        You seem to be getting the "results" in the main thread via the results queue; so what is the jobs queue/nodes hash for?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044480]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (12)
As of 2014-04-18 21:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (472 votes), past polls