. . . You're using threads
I use 'fork' to generate the children. I can't show the failing subroutine, since all subroutines have failed at one time or other. This is a 20,000+ line sub-system with 91 subroutines. Some times, a process works for 1-hour and then loses the pointer, and some times it's almost immediate. From analysis of the logs, it usually happens in the 4th or 5th inner subroutine call. The application starts with 4 children per core, and expands if xx% are working, and contracts if more than minimum exist and work is below yy%. All children live for approximately 4 hours or end early if their RSS is more than 2 * original RSS.
Production: Suse Linux or AIX
"Well done is better than well said." - Benjamin Franklin