Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Thread terminating abnormally COND_SIGNAL(6)

by rmahin (Beadle)
on Jul 15, 2013 at 22:30 UTC ( #1044465=perlquestion: print w/ replies, xml ) Need Help??
rmahin has asked for the wisdom of the Perl Monks concerning the following question:

Hey monks,

I have a large multi threaded program and have run into an issue where i see this message:

Thread 1 terminated abnormally: panic: COND_SIGNAL (6) at D:/Perl64/lib/Thread/Queue.pm line 31.

I am running on Windows Server 2003 R2 x64 SP1. 4gb RAM. This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x64-multi-thread.

I have been unable to reliably recreate this issue. But the problem seems to be in the enqueue() subroutine in the Thread::Queue module for some reason. Perhaps with that error code it will be enough to figure out the source of the problem, but I am not sure how to get any meaning from it.

I realize that is getting printed from:

#define COND_SIGNAL(c) \ STMT_START { \ if ((c)->waiters > 0 && \ ReleaseSemaphore((c)->sem,1,NULL) == 0) \ croak("panic: COND_SIGNAL (%ld)",GetLastError()); \ } STMT_END

But I cant find where that is being called.

Aside from that, I'm not really sure the best approach for asking for help. The gist of the program is as follows:

1. Main script first creats a JobQueue, and starts a thread using its manageQueue() subroutine.

my $jobQueue = IBM::CLIFARM::SERVER::UTIL::JobQueue->new(DBfile => $db +File); threads->create('IBM::CLIFARM::SERVER::UTIL::JobQueue::manageQueue');

2. Script then creates a pool of threads using another subroutine which is used for clients to connect and issue commands.

3. When they issue commands, a JobNode is created, which basically just contains information about the job. Then we call the subroutine enqueueJob passing the jobNode which blocks until it is that job's turn to execute using the dequeue subroutine of Thread::Queue. Basically the JobNode waits until the JobQueue enequeue's a jobNumber into the JobNode's Thread::Queue (if that made any sense). It was implemented that way because it seemed to be simplest way to pass information between the different threads.

my $jobNode = IBM::CLIFARM::SERVER::UTIL::JobNode->new({process => + $process, jobType => $recover_type, userName => $username, options = +> $options}); $jobNode->setPossibleResources(resourceGiven => $resources); IBM::CLIFARM::SERVER::UTIL::JobQueue::enqueueJob({jobNode => $jobN +ode}); $logger->debug("Created new job node for $recover_type command wit +h job number $jobNode->{JOBNUMBER}");

If you need me to elaborate more on that, let me know. Here are links for the JobNode and JobQueue. Seemed a little long to post their entirety here. I tried to strip out most of the irrelevant stuff for this problem. JobNode.pm JobQueue.pm

This error message popped up when the JobQueue tried to enqueue a jobNumber to the JobNode's JOBNUMBER_QUEUE, in the setJobNumber subroutine.

This ran fine for roughly 24 hours, and then all of a sudden it the JobQueue thread died with message above. Ive tried to give as much information as I can as I am at a complete loss of where to start debugging this. Especially since I'm having trouble recreating it. If you need any additional information, or have any suggestions at all, please let me know.

Thanks a bunch!

Comment on Thread terminating abnormally COND_SIGNAL(6)
Select or Download Code
Re: Thread terminating abnormally COND_SIGNAL(6)
by BrowserUk (Pope) on Jul 15, 2013 at 23:23 UTC
    Thread 1 terminated abnormally: panic: COND_SIGNAL (6) at D:/Perl64/lib/Thread/Queue.pm line 31.

    The error code (6) means:

    say $^E=6;; The handle is invalid

    And that suggests that the semaphore (sem) in ReleaseSemaphore((c)->sem,1,NULL) == 0)   has somehow become invalid. Perhaps already closed or corrupted.

    How many Queues are you creating?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I create 2 Thread::Queue's per JobNode, and in my testing i was creating roughly 10 JobNodes every 2 minutes. So that would be about 300 every hour? These queues though are basically used only once, in that after the JobQueue enqueues something, the JobNode no longer cares about that Thread::Queue. Is there something I should be doing to actually clean these up somehow?

      Are there any typical cases that could cause the semaphore to become invalid? Like too many Thread::Queue's?

        Are there any typical cases that could cause the semaphore to become invalid? Like too many Thread::Queue's?

        I'm not sure. But, internally, a Windows system call WaitForMultipleObjects is used in various places, and this has a limit. Historically that was 64 handles though it might have changed on later versions. Note: You can have (many) more waitable handles, you can only wait on up to 64 of them at a time without using additional techniques. NOTE: This was just a guess on my behalf having looked at your linked code.

        I create 2 Thread::Queue's per JobNode, and in my testing i was creating roughly 10 JobNodes every 2 minutes. So that would be about 300 every hour? These queues though are basically used only once, in that after the JobQueue enqueues something, the JobNode no longer cares about that Thread::Queue. Is there something I should be doing to actually clean these up somehow?

        Firstly, using a Queue to pass a single value is a nonsense.

        1. Do you not know you can pass arguments to thread when you create them?
          my $thread = async( sub{ print "@_"; }, 123, 'fred', {'a'..'f'}, [ 0.. +9 ] )->join;; 123 fred HASH(0x3ea93f8) ARRAY(0x3ea9470)
        2. Have you heard of threads::shared?

          If you need to pass a single value to a thread after it has been created rather than when you create it, then use a shared scalar:

          my $jobNo :shared = 0; ... sub job { sleep 1 until do{ lock $jobNo; $jobNo; }; ... use $jobNo. ... } ... my $thread = threads::create( \&job ); ... { lock $jobNo; $jobNo = getJobNo(); }

          Of course, you'll want a different jobno of each thread, so use a shared hash:

          my %jobNos :shared; ... sub job { my $tid = threads->tid; my $jobNo; sleep 1 until do{ lock %jobNos; $jobNo = $jobNos{ $tid } }; ## now use $jobNo. ... } my $thread = threads->create( \&job ); ... some time later { lock %jobNos; $jobNos{ $thread->tid } = getJobNo() }; ...

          There are other (some would say better) ways of waiting for a shared variable -- cond_vars -- than busy looping over sleep, but this is easy to write, explain and -- most importantly -- debug.

        Secondly, yes, you should be cleaning up those queues. Each queue encapsulated various system resources -- including those semaphores -- and the are a finite resource. 300/hour for 24 hours means 7,200 semaphores. I can't tell without deep inspection of your code, but you could simply be running out of resources. I'd expect to get a different error message than you have -- something like: Insufficient system resources exist to complete the requested service when the queue (or a resource it uses) was being created, but it is possible that an error return is not being checked at that point.

        Remember also that for a Queue to be cleaned up, *all references* at both ends will need to be freed completely before the reference count will drop to 0 and it will get recycled.

        This could be a bug in threads or Threads:Queue, or perl's internals, but having glanced briefly at your linked code, I suspect that it is much more likely that the problem is sourced in the way you are abusing those modules.

        In essence, I think you are constructing a very complicated system around the use of threads and queues, but you do not really know enough about those modules to be doing so. I'd strongly advise that you create a few simple, stand-alone programs and play with threads, Thread::Queue (and threads::shared, and acclimatise yourself to them before using them within what appears to be a very complex library module -- presumably intended to be used by others.

        I hope that does not sounds patronising -- it certainly isn't intended to. I just know from deep experience that Perl's threading is quite different to other forms of threading and it takes everyone coming to them -- regardless of their threading background in other languages -- a while to become familiar with their particular strengths and weaknesses.

        Often at this point, I offer to review the threaded code (here or via email), but given the presence of "IBM::CLIFARM::SERVER" in the title of your module, I doubt there would be any point. I don;t have a server farm lying around -- IBM or otherwise :) And from looking at the bits you linked, this isn't something that could be debugged 'by inspection' (without running it).

        To reiterate:

        • I strongly suspect you are running your system out of some critical resource.

          You might be able to verify this using the System Information panel of ProcessExplorer.exe and checking the "Totals->Handles" count whilst your code is running. If it keeps rising and rising -- and drops back significantly when you kill your process ....

        • I think that -- on the basis of the little I've seen -- that your code will need a substantial re-work to make it viable.

          Your comments already show you are uncomfortable with using queues the way you are.

          Doing so is simply wrong, and almost certainly completely unnecessary.

          You just need to become familiar with the facilities and techniques available to you, at which point you'll see a better way to tackle the problem.

        • There is not much I can do to help you with a project of this size and complexity.

          I strongly advise that you try out the main components of your project is small, stand-alone throw-aways until you are convinced they work the way you want them to.

          These will not only let you become familiar with the way Perl's threading works; but when you get problems, you have a ready-made test case you can post here for us to help you with.

          I write all my projects this way -- small stand-alones to iron out the details of the algorithms (and my understanding) -- and then I design the main project in the light and knowledge of what I've learned. I strongly advocate the method to you (and anyone listening).


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        ..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1044465]
Approved by NetWallah
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2014-12-21 22:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (108 votes), past polls