Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Thread::Pool::Simple || !

by learnedbyerror (Monk)
on Jul 18, 2012 at 14:46 UTC ( [id://982455]=note: print w/replies, xml ) Need Help??


in reply to Re: Thread::Pool::Simple || !
in thread Thread::Pool::Simple || !

I'll combine my response to both BrowserUK and bulk88 together.

BrowserUK, my threads are wrappped as you have shown.

Both - I have not been able to identify the cause of the threads failing. I suspect that I am probably tripping on some unholy synchronization between the OS (RH15 - 2.6.32.308.el5stab099.3 currently), perl's thread implementation (5.12.4 currently)and my code. The failures happen intermittently recurring anywhere from 1 hour to 1 week/month. My guess is that something is going on the the "thread unit of execution" where there is a failure from an eval and that before perl can recover on the next step, the OS steps in and terminates that unit before perl can log or recover. In all honesty, this is a SWAG or maybe a WAG. I have not been successful in findinb the root cause.

Yesterday afternoon, I started an alternative approach. I have added a sub that I call check_threads to my thread management module. This sub checks to see if all of the thread objects registered are still running or joinable, if not, it calls my start_threads sub to add a replacement thread(s). Given my current use of Thread::Queue to feed the threads, the new thread(s) will dequeue and proceed. I will loose the work against the data that the failed thread dequeued. This is acceptable in my case as subsequent executions of the overall process will pick it up.

I'm fairly confident that this approach will get me to the reliability and performance levels that I need. But, I'm not too happy with the overall code. I "feel" like I have had to write too much code and have, in my ignorance, missed a "better" way of getting my work done. With all of my code functioning, I am going through one more refactoring effort primarily to insure supportability of the code. Once in production and when I get a bit of free time, I want to come up with a test scenario, that I can share freely, and throw out here to get more brains looking at it. I want to go back and test it against the models on CPAN that I tried earlier as well as Coro which I deselected because of a preference to not use coroutines. I also want to look at Perl6 and possibly functional languages like Erlang and Haskell.

My gut is still telling me that there is a simpler approach to this than what I have. If there isn't, then I need to spiff up my thread management kit and put it up on CPAN.

Cheers, lbe

Replies are listed 'Best First'.
Re^3: Thread::Pool::Simple || !
by BrowserUk (Patriarch) on Jul 18, 2012 at 15:22 UTC
    The failures happen intermittently recurring anywhere from 1 hour to 1 week/month. My guess is ...
    I want to go back and test it against the models on CPAN that I tried earlier

    To paraphrase: I've a bug in my code which I can't be bothered to track down and I'm hoping there is a CPAN module that will allow me to conceal it.

    BrowserUK, my threads are wrappped as you have shown.

    Then how do you know threads are failing? Are you logging errors? Logging when threads end? How?

    My gut is still telling me that there is a simpler approach to this than what I have. If there isn't, then I need to spiff up my thread management kit and put it up on CPAN.

    NO! NO! Please no! There is enough ill-conceived, bug-ridden, utterly useless, "threading management" crap on CPAN already.

    It is no wonder so many people are put off from using Perl's threading, given their first experiences of it are installing a module with a fanciful named, (and often as not, with a positive CPAN rating or two from know-nothing sycophants), only to discover that it is either completely broken -- or worse -- kind of works some of the time, but every now and again, silently throws away a bunch of work items.

    Even if your application is tolerant of your threads module throwing work away every now and again, please don't "spiff it up" with a little concealer and foist it upon unsuspecting others.

    If you truly believe that your threads management module is a) generic enough; b) lightweight enough; and c) reliable enough; to have the potential to become a widely deployable thread-pool module -- and you obviously aren't there yet -- then post it here along with your current application -- suitably cut-cown and anonymised as necessary -- and let us help you solve the known problem.

    Once we've done that, I'll willingly throw a few of my testcase applications at it and see if it a) actually works for them; b) is actually any simpler and/or better than hand-coding.

    As you may be aware, I am pretty skeptical that it is possible to write a properly generic thread-pool solution that isn't actually more complex than hand-coding them to fit the specific application. But I am willing to be proved wrong. And more than willing to help prove me wrong, if I see an architecture/API that seems to work.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      BrowserUK - please check your attitude! It is unbecoming of a monk of your stature. Your paraphrase is wrong, actually damn wrong.

      I have spent more than a little time trying to find out what is happening when the threads die. I know they die because they cease to log "anything". Other threads working on the same queue continue about their merry way. The thread(s) that die simply disappear akin to Amelia Earhart. And like the disappearance of Amelia Earhart, it may take more than a couple of decades to figure it out.

      I am guessing that you have more than enough experience to understand how difficult it can be to find intermittent bugs. This is especially true if your own code isn't the sole issue. Over the last 30 years, I have worked on more hardware, OS, and development languages and framework than I care to remember. Without exception, they all at time to time exhibited behaviors that cannot readily, or even after a lot work, be explained.

      At some point, I as a human, will throw in the towel on getting to the bottom of a problem if I can take a path around it or mitigate it in some means. I have decided that I will not, not will not bother to, expend any additional hours of my life that I cannot get back to research, test and identify the root cause of my issue.

      CPAN is far from perfect. I do not like the amount of cruft that is out there in it. But it is in the wild. In the wild you will find snakes, pirate and liars as well as things of great beauty. I have found many modules in CPAN that I absolutely adore and use as if they were part of the perl CORE because they bring benefit to me that I want, need and appreciate. Even the modules written by snakes, pirate and liars are helpful to show me how "NOT" to do something.

      I do wish that the perl community would do something about having a means to obtain a perspective on how well a module(s) does or doesn't work other than one off comments on CPAN. I would be more than happy to help. For my own sanity, I generally do not use a module if I can't find enough evidence of its consumption by others and indications that the code is maintained to merit considering it.

      However, given the amount of time that CPAN has been in existence and the fact that this issue has existed since shortly after its establishment, I think it unlikely that it will be addressed soon. And if you already haven't looked, please do take a look at CPAN and see if you find any modules, flakey or otherwise put there by me. I have not put anything on CPAN because I could not get it up to my standards in the time that I have. I "will not" put anything on CPAN or elsewhere that isn't ready for consumption and that I won't support myself or via a support group.

      Capiche?

      lbe

      Is this a rant? I hope not. I don't allow myself to rant -- in public. :)

        Ask yourself this.

        Would you accept a text editor that worked well, but occasionally died throwing away unsaved changes; provided it restarted itself and loaded up the last saved copy?

        Because that is exactly analogous to what you are looking to do with your threads. And are suggesting you might put on CPAN.

        With respect to tracking down your transient error(s); show me the code. I bet I can track it down.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

        I am guessing that you have more than enough experience to understand how difficult it can be to find intermittent bugs. This is especially true if your own code isn't the sole issue. Over the last 30 years, I have worked on more hardware, OS, and development languages and framework than I care to remember. Without exception, they all at time to time exhibited behaviors that cannot readily, or even after a lot work, be explained.

        At some point, I as a human, will throw in the towel on getting to the bottom of a problem if I can take a path around it or mitigate it in some means. I have decided that I will not, not will not bother to, expend any additional hours of my life that I cannot get back to research, test and identify the root cause of my issue.

        Then surly you know how to fix this problem.

        Use a C debugger, attach to the OS thread, look at the callstack, and fix it. You probably need a DEBUGGING build of perl so your C debugger works cleaner. No OS thread can "freeze" itself or deschedule itself without help from the OS. If the thread busy waiting sucking cpu, ps/top/task manager will tell you. If your threads are "disappearing" (I can't tell if you mean they disappear from the OS, or they simply freeze indefinitely) without a trace, chances are very high your leaking resources and ram too. Since your use Unix, have your tried setting signal SEGV/BUS/FPE/ABRT/other CPU exceptions from perl to see if your thread is throwing one of those signals?

        You haven't shown any code, or explained what specific CPAN modules and C libraries and XS modules you are using. I'll make a wild ass guess and say you probably have what is called a race condition in a 3rd party C library (access vio or thread sync/mutex deadlock), or you are doing network I/O with no timeout.

        I'm not sure exactly how process/user resource limits work on Linux (I'm a Win32 person). I've read that the Linux kernel doesn't know what threads are, each thread is a separate process in the same memory sandbox. So maybe one of your OS threads/ithreads hit the OS per thread resource limit and thats why it disappears (if disappearing is what happens on Linux when 1 thread in a multi threaded process hits a per thread resource limit).

        If you showed code, I bet BrowserUk could track it down as he claims in Re^5: Thread::Pool::Simple || !.
        BrowserUK - please check your attitude!

        {Check} S'fine thanks. I was trying to help you help yourself. You don't want it. S'fine too.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://982455]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 16:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found