Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: panic: COND_DESTROY(6)

by BrowserUk (Patriarch)
on Jan 26, 2012 at 14:35 UTC ( [id://950095]=note: print w/replies, xml ) Need Help??


in reply to panic: COND_DESTROY(6)

You could provide a little more info perhaps?

  1. OS?
  2. Perl version?
  3. threads version?
  4. threads::shared version?
Am i thinking right that the "6" is a TID of thread that crashed?

More likely the number is the numeric error code. On Windows that would be "invalid handle" returned from the attempt to close the semaphore associated with a threads::shared condition variable:

#define COND_DESTROY(c) \ STMT_START { \ (c)->waiters = 0; \ if (CloseHandle((c)->sem) == 0) \ Perl_croak_nocontext("panic: COND_DESTROY (%ld)",GetLastError( +)); \ } STMT_END

Of course it might mean something different on other OSes.

Your best bet would be to post the code, assuming it isn't too large or proprietary or require too much in the way of unique set-up.

If it is, then try to reduce as much as possible whilst still having the error occur. (I appreciate that can be difficult with transient errors like this.) But it will be very hard to advise without sight of the code in question.

If it is the invalid handle problem, the most likely cause is the handle being closed twice, but working out how that might occur will require sight of the code.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: panic: COND_DESTROY(6)
by menth0l (Monk) on Jan 26, 2012 at 15:09 UTC
    I've just updated my post with os info and such. Tommorow i'll refer to the rest of your post.
      From root update:
      Windows 2007 Server

      Um. Did you mean Windows Server 2008?

        My mistake ;)
Re^2: panic: COND_DESTROY(6)
by menth0l (Monk) on Jan 27, 2012 at 08:33 UTC
    I can't really put the code here since i'm bounded by my company's policy.

    But maybe there is another way around this. Someone suggested that this may be related to semaphores in my code. But i don't use semaphores, only locks (i'm locking Object-InsideOut type object). I assume that perlish locks are implemented using low level semaphores?
      But i don't use semaphores, only locks (i'm locking Object-InsideOut type object). I assume that perlish locks are implemented using low level semaphores?

      Yes. A condition variable is a C struct containing a count of the threads waiting, and a semaphore handle:

      typedef struct win32_cond { LONG waiters; HANDLE sem; } perl_cond;

      When a condition variable is garbage collected (DESTROYed), the semaphore handle is closed, then the memory for the struct is freed. The panic you are seeing is occurring when the attempt to close the semaphore handle fails. The only way I can see this happening is if there is a second attempt to DESTROY a condition variable that has previously been destroyed.

      That would put the root cause of problem outside of the realms of your code firmly in the auspices of Perl/threads::shared. But that doesn't help you solve or work around your problem; nor does it give the maintainers any clue as to the circumstances under which the bug is occurring.

      The only long-term viable way forward that I see, is for you to remove as much of the proprietary code and dependencies from the code as you can, whilst retaining the flow that causes the bug to occur, and then post that. Odds are that this would allow us to find a workaround that you could fold back into your proprietary code; and give the maintainers a testcase on which to base a future fix.

      Looking at the change history for threads::shared, there were changes relating to shared object destruction in the latest build (which you are using), and earlier in version 1.33. My first step would be to downgrade thread::shared on your installation to version 1.32 and see if that 'fixes' the problem.

      But for a long term fix, you should really consider trying to come up with a cut-down testcase for the problem, that you have permission to publish. (The smaller the better!).


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Thanks for shedding some light on semaphore case.

        I'd really like to create a test case but there's a few obstacles that i'd have to overcome in order to do this. First of all the app is pretty large. In minimal version there are at least 4 threads involved:
        1. thread hosting rpc server
        2. thread preparing load balancing for sql data related to rpc request
        3. thread that sends request with data definition to remote host
        4. thread keeping track of available hosts
        I can't pinpoint which thread is to blame (i can't guess it neither from log nor from the message itself).
        Second of all, the app is tied to local sql database and i can't see any sane way to simulate that in a script.

        I know about the threads::shared bug that was fixed (i reported it to J.D. Hedden ;) but downgrading seems to me like "out of the frying pan, into the fire" - type situation :) But i guess You are right - maybe i could spot this problem more easily with this.

        Although i'm still in a deep dark wood some light can be seen ;) Thanks to You i know it's something related to locks. I'll try to mess around with the code which uses lock the most and test, test, test.... Maybe i create a bug report after all.
      I can't really put the code here since i'm bounded by my company's policy.

      Could you reduce your code to a minimal version that demonstrates the problem, is short enough to post, and contains no proprietary information?

      For example, about a week ago, I also posted a question relating to threads. The initial problem I saw was in a big and secret perl script, that I would definitely not be allowed to post, but I reduced the script by removing & commenting out big blocks of code until I was left with a 25 line script that demonstrated the problem.

      That script contains nothing secret so there is no problem posting it, and also it is much shorter so it is easy for our fellow monks to understand the problem.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://950095]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (9)
As of 2024-04-18 17:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found