Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^5: PANIC: underlying join failed threded tcp server

by BrowserUk (Pope)
on Oct 18, 2012 at 02:14 UTC ( #999651=note: print w/ replies, xml ) Need Help??


in reply to Re^4: PANIC: underlying join failed threded tcp server
in thread PANIC: underlying join failed threded tcp server

As for adding the trace messages to threads.xs, not sure how to go about that. I searched for it in the perl installation folder for it but it was not there. Is it part of the threads.dll?

The easiest way would be to grab the latest version of threads from cpan, then build and install it manually. (Once you've checked you can still reproduce the error with the newly built version), you can then modify threads.xs, re-build and install.

Do you think any harm would be caused from doing something like my $val = eval{$join->join()}; as just a way to prevent the server from crashing.

If you are not doing anything with the return value from the join, it would probably be an okay temporary workaround. If you're only joining teh thread to make it go away, and it is going away of its own accord, in one sense, job done.

But in the long term, whatever the cause might just be a symptom of a deeper issue in your code; or in the threads module; or perl itself. As you have a recreatable scenario, it would be silly not to use it to help track down the underlying issue.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

RIP Neil Armstrong

p


Comment on Re^5: PANIC: underlying join failed threded tcp server
Download Code
Re^6: PANIC: underlying join failed threded tcp server
by rmahin (Beadle) on Oct 18, 2012 at 23:34 UTC
    Hey thanks for explanation of how to do that. I got what you asked for, and here is the link to a text file of the output from the start to finish https://dl.dropbox.com/u/19686501/serverOutput.txt None of that info is particularly helpful to me at least but if you see something I don't, I'm all ears. Thanks for your patience in me getting back to you. Took a while to recreate this time.

      Sight of the changes made to the OP code and threads.xs would make interpreting that output a possibility :)

      Also, what OS/version; Perl/version; threads/version?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Changes made to threads.xs
        /* Join the thread */ #ifdef WIN32 printf( "thread handle:%x thread-id: %dx\n", thread->handle, G +etThreadId( thread->handle ) ); if (WaitForSingleObject(thread->handle, INFINITE) != WAIT_OBJE +CT_0) { printf("GetLastError output: '%d'", GetLastError()); /* Timeout/abandonment unexpected here; check $^E */ Perl_croak(aTHX_ "PANIC: underlying join failed"); }; #else if ((rc_join = pthread_join(thread->thr, &retval)) != 0) { /* In progress/deadlock/unknown unexpected here; check $! +*/ errno = rc_join; Perl_croak(aTHX_ "PANIC: underlying join failed"); }; #endif

        Changes made to rxd.pl
        # wait to join and delete from hash foreach my $join (threads->list(threads::joinable)) { my $val = eval{ $join->join() } or die "Join failed with '$!' +: '$^E'"; tprint("_handle() output: " . $join->_handle()); #my $val = $join->join(); # tprint("Deleting fd- $val"); delete $FDcache{$val}; }

        OS/Version: Microsoft Windows Server 2003 R2 Enterprise x64 Edition Service Pack 2 And this is a VM if that makes any difference.

        Perl/version: Active perl v5.16.1

        C:\>perl -v This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x +64-multi-t hread (with 1 registered patch, see perl -V for more detail)

        threads/version: threads-1.86


        Let me know if you need anything else!
      None of that info is particularly helpful to me at least but if you see something I don't, I'm all ears.

      What I can see from that info is:

      1. At line 3426:
        RXD+ (982) > Command completed with a return code of 0 RXD+ (0) > _handle() output: 51686792 thread handle:2d00 thread-id: 4240x

        A thread, with a perl thread Id (tid) of 982 completes and Windows thread handle:2d00 has (just prior to joining) a OS thread ID of 4240, and the join completes without error.

      2. Later, at line 3775:
        thread handle:2d00 thread-id: 0x GetLastError() output: '6' Join failed with 'Bad file descriptor' : 'The handle is invalid' at rxd.pl line 1128.

        Just prior to a join attempt, 'another thread' with the same OS thread handle 2d00, this time does not have an OS thread id, which indicates that the thread handle:2d00 is indeed an invalid handle as the system reports.

      What that indicates is that either:

      • The OS is reusing the same OS thread handle -- which whilst possible seems unlikely.
      • Or this; threads->list(threads::joinable) is returning the handle of an already joined thread. Which also seems unlikely, but could happen if the (Perl) internal linked list got corrupted some how.

      The next thing I would try is adding a similar trace line at the end of S_ithread_create(), something like:

      S_ithread_create( ... printf( "ITCREATE: thread handle:%x thread-id: %dx\n", thread->han +dle, GetThreadId( thread->handle ) ); MY_POOL.running_threads++; return (thread); }

      And also in

      STATIC void S_ithread_free(pTHX_ ithread *thread) { ... #ifdef WIN32 printf( "ITFREE: thread handle:%x thread-id: %dx\n", thread->handl +e, GetThreadId( thread->handle ) ); if (handle) { CloseHandle(handle); } #endif ... }

      The idea is to isolate whether -- when the error occurs -- the invalid handle is to a thread that has already been freed -- in which case the bug is in threads::list() -- or to a thread that has not yet been freed -- in which case it would mean an OS error of some kind; perhaps resource constraint;

      I breifly looked at trying to run your server here and trying to re-create the failure. Whilst the server runs and accepts connections from a telnet seesion, it won't accept input from it because (my) telnet sends character by character and it is expecting entire commands wrapped in your (incredibly complicated) comms protcol.

      There is no way I am going to be able to reverse engineer a client that can talk that protocol.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Alrighty, took a bit longer to recreate this time for whatever reason. I made the changes you suggested, but in the line printf( "ITFREE: thread handle:%x thread-id: %dx\n", thread->handle, GetThreadId( thread->handle ) ); i changed thread->handle to just 'handle' since it looks they already freed that pointer at that point.
        I attached the log of this run below in the files called serverOutput2.txt, it exited with a different error this time,
        Join failed with 'Inappropriate I/O control operation' : 'The handle i +s invalid' at rxd.pl line 1128. The RXD server has been shutdown.Perl exited with active threads: 151 running and unjoined 4 finished and unjoined 0 running and detached
        Which again appears to be related to the thread handle.
        Sorry about the complicated protocol. People I work with did not wish to go through the trouble of having much back and forth communication between the client/server, and rather just send command once, receive response, and still needed a way to transfer a 400MB file. So this is what we (I) came up with (you should have seen the earlier version). I've attached the client (rx.pl) as well as the test script i used to recreate this issue. (i also included the server, rxd.pl, with all other commands besides exec stripped out except for EXEC to shorten the code. And included the exact threads.xs used to compile the threads module)
        Thanks again for the help. Files:
        https://dl.dropbox.com/u/19686501/perlmonk.zip

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://999651]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2014-07-30 11:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls