Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

How to debug perl code that uses threads

by chrestomanci (Priest)
on Nov 23, 2011 at 09:17 UTC ( [id://939624]=perlquestion: print w/replies, xml ) Need Help??

chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greetings wise brothers.

I have been asked to do some maintenance on a perl program that makes substantial use of threads. I have attempted to debug it using the perl command line debugger, and I am finding the process difficult because more than one thread is active at once and they are not separate. If there are multiple breakpoints active, and multiple threads that could hit then I find that while stepping over code, my next prompt could at any time be in a different thread from the one I was debugging a moment before.

I know the obvious suggestion is to turn of the multi threading, but the design of the program is such that to do so would be a substantial undertaking, and I don't want to risk breaking parts of the program that I don't need to.

In the past when debugging perl programs that make use of multiple processes via fork(), I have been able to keep the processes separate by running the debug session under linux in an xterm window, then if a child process hits a breakpoint, then another xterm window is created for that process so I can debug it separately from the parent and any other children.

Is there a similar technique for debugging Perl that uses threads? Any other tips?

Replies are listed 'Best First'.
Re: How to debug perl code that uses threads
by BrowserUk (Patriarch) on Nov 23, 2011 at 10:09 UTC

    The real answer to debugging threads is to design the code for ease of development/debugging from the outset. Which as far as threads is concerned, means designing each individual thread as a linear flow or data-driven loop with as few inter-thread communications and no synchronisation points. And ensuring that those few inter-thread communications are done via tried and tested mechanisms.

    This allows each thread to be tested individually by mocking up its communications flow in a light-weight test environment. Once all the parts have been proven to work alone, testing them together becomes a process of monitoring or tracing the combined communication flows.

    But none of that helps you given your situation.

    My approach to your problem would be ignore the debugger at the start and add a few trace lines into the actual code at critical points. The trace would be written to a new queue added for the purpose and would consist of (just):

    threadId timestamp linenumber

    I'd start another thread who's only purpose is to simply write everything on the queue to a file.

    By writing this minimal information to a queue, you have minimal impact upon the code under test whilst obtaining a clear overall picture of the flows through the program without producing too huge a trace file to have to plough your way through.

    I've also tended to add a ^C interrupt handler in main that simply injects a flag into the trace queue. This is useful when the program manifests behaviour the programmer can see and may want to investigate. You just hit ^C when you see the event occur, and it adds a flag to the trace for off-line investigation later.

    If the program uses one or more pools of identical worker threads, temporarily reduce the number of each type to one. It greatly simplifies the trace.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      This is an aside, but I would love to see the contents of your suggestions in a Perl tutorial... I think it'd be a really valuable resource that spoke to the general approach that explained:

      1. Linear Flow or a Data-Driven Loop

      2. Minimal interthread communications/No Synchronization points

      3. Interthread communications use tried/tested means.

      (and yes... I want a pony... ;) )

        It was my intent to respond to your request here, but the more time I spend on it, the more it's going to take to do the job properly. Hence, I'll try to post a meditation in the next couple of weeks.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to debug perl code that uses threads
by Anonymous Monk on Nov 23, 2011 at 09:30 UTC

    Any other tips?

    What is the nature of the bug you're trying to fix

    What is the basic jist of the program, the flow/pattern? ie, what is this

    use threads; use Your::Module; my @AoH = map $_->join, map { async( \&{"Your::Module::$_"} ) } qw[ sub1 sub2 sub3 sub4 sub5 sub5 sub7 sub8 sub9 sub10 ]; ## do something with the 10 hashrefs here

    Or Re^2: parallel HTTP::Request

    I always fall back on log file debugging :) Devel::Trace, Devel::TraceCalls, log4perl...

      On further investigation, (mostly from reading the badly documented code), I discovered the flow and pattern of the program.

      • The parent thread is only there to parse command line args, create shared variables etc.
      • One thread scans the filing system looking for jobs to do. The job objects are placed into a shared queue.
      • Then a variable number of worker threads (up to 30 of them) pop items from the queue of jobs and do the work.
      • There was a slight complexity where the workers signal to the work finding thread via a semaphore, so that once the queue has enough jobs in it, the work finding thread pauses until the workers signal that they are idle.

      Once I figured out what was going on, it was easier to debug, I simply hacked the code a bit so that queue was populated with a small number of jobs, and then a single worker thread was started.

        There was a slight complexity where the workers signal to the work finding thread via a semaphore, so that once the queue has enough jobs in it, the work finding thread pauses until the workers signal that they are idle.

        That wiffs a lot of code smell.

        Without being party to the actual implementation that allows 30 workers to signal their idleness (or lack thereof) to the finder thread, it generally means that the workers are using dequeue_nb() rather that dequeue().

        • And that means that they are effectively polling the queue in busy loops rather than just waiting for something to arrive.

          Even if you regulate that busy loop with a sleep, is still means that the thread has to wake up every so often to see if there is anything to do, whereas if it used the the blocking version, it will get woken up when there is something to do.

          Contrast waking up once every minute to see if it is time to go to work, versus using an alarm clock.

        • It can also mean that when the finder thread actually has work to do, it has to compete with 30 workers all taking time slices to see if there's anything in the queue yet.

          The workers may not use many cycles between sleeps in order to check the queue, but in order to wake up at all, the OS has had to do the 100s of thousands of cycles involved in a context switch.

          Think of it like a telephone receptionist trying to conduct a conversation whilst also responding to 30 wrong numbers. The wrong numbers don't take long to deal with, but their affect on the conversation is disastrous.

        • Using dequeue_nb() usually requires an additional mechanism be invented for telling the workers there is no more work.

          This because you can no longer use the simple and intuitive $Q->enqueue( (undef) x $nThreads ) to signal the workers they are done.

        • Finally, depending how it has actually been implemented, it can lead to feast-famine cycles. (Also known as boom-bust.)
          1. The workers all poll away at the queue waiting for work, meanwhile the finder goes looking for and posts work items.
          2. Then the worker are all busy, so the finder sits doing nothing but polling the semaphore waiting for the workers to run out of work.
          3. goto a.

        The better mechanism for controlling the queue size is to have the finder sleep for a short period when the queue size reaches some high water mark. The benefits of this are:

        • Only one thread -- the finder -- is ever polling.

          Workers only wake up when there is something for them to do.

        • The finder polls on the number of items left in the queue. Ie. sleep 1 while $Q->pending > 30.

          Only one ITC mechanism -- the queue -- is required, which avoid potential conflicts and race conditions; avoids the workers having to do anything to signal the finder.

        • The low-water mark level can be easily tailored to suit different hardware/software environments.

          With a little more programming, it can even be dynamically adjusted.

        • The finder detects when the queue is running low, before the workers become idle.

          Hence, it can overlap its work with that of the workers and the workers stay busy avoiding the famine scenario.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: How to debug perl code that uses threads
by sundialsvc4 (Abbot) on Nov 25, 2011 at 14:11 UTC

    Very well... then I shall step in and try to earn some XP’s that go in the other direction ...

    The most-successful analogy I’ve found for a good multi-threaded process is that of a fast-food restaurant.   First of all, everybody makes it out of the store alive at the end of the shift.   At times they have been busy-as-crazy and at times they have been staring at the label on a french-fry machine.   You can see that each worker’s workflows are based on queues... in and out.   Also, each one contributes to the order in some way, but the guy who’s cooking those fries isn’t the one putting them onto the plate.   Finally, each worker’s task can be isolated from any of the others, say, for training (debugging) purposes.

    In such a system, you can never say precisely what the system will be doing at any particular moment, but you can say that the system is adaptive to whatever happens, that it is tunable in real-time by the manager on duty, and that its behavior can be predicted through stochastic modeling.

    There are more-elaborate variations on this, including the “jack of all trades model,” in which any worker can do any task (can play any role ...) and they switch hats according to which unit-of-work they select from which queue.   In this model, the number of workers is strictly an expression of the “maximum multiprogramming level” of the system; of the maximum amount of simultaneous activity that the system will currently attempt to perform under worst-case conditions.   The workflow system is literally a system of queues and the workers are merely the ones who are buzzing around it, making honey.

    In every case:   each worker sleeps (consuming no CPU resources) until it has something to do.   While working on its appointed or selected unit of work, it doesn’t have any dependencies upon any other worker.   (If the work shows up in its queue now, that unit of work can be completed now.   It won’t have any lock-contention except between it and another worker who is now performing the same role, which won’t happen.)

    For debugging purposes, as I said, it really comes down to design.   If the worker roles are laid out in this way, every worker can be pulled into a Test:: jig and it can be debugged in isolation.   The “plumbing” consists of stock CPAN parts, already debugged.   Workflow managers can be found there, too.

Re: How to debug perl code that uses threads
by sundialsvc4 (Abbot) on Nov 24, 2011 at 13:02 UTC

    As BrowserUK says, the best solution is one of good design.   I too would love to see tutorials or meditations on the subject.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://939624]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-03-19 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found