http://www.perlmonks.org?node_id=1062464


in reply to Multithreaded process on AIX slow

There is only one way you will make progress on this: profile.

Profile a run on the AIX machine. Profile a run on the xeon machine.

Compare.

If you are unfamiliar with profiling, or want someone to cast a second pair of eyes over the output, and you could send me the block, line & sub .csvs produced (by NYTProf) for the main script plus the sources, I'd willingly take a look.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Multithreaded process on AIX slow
by scunacc (Acolyte) on Nov 13, 2013 at 20:48 UTC

    Hi, appreciate the suggestion but…

    From the docs:

    "Devel::NYTProf is not currently thread safe or multiplicity safe. If you'd be interested in helping to fix that then please get in touch with us. Meanwhile, profiling is disabled when a thread is created, and NYTProf tries to ignore any activity from perl interpreters other than the first one that loaded it."

    So, - not really any help there I'm afraid. This application cannot be run - or rewritten to run - without threads. By definition it runs with over 100 threads and already takes a lot of shortcuts, e.g. to reduce memory footprint per thread for example like doing dynamic module unloading in each thread after startup - which will confuse profiling. To run the program without threads (even if that were possible which it isn't -would take several months development effort to rewrite), it would then take 10s of hours to run the data.

    Kind regards

    Derek

      Devel::NYTProf is not currently thread safe or multiplicity safe.

      Sorry. I don't use NYTProf myself, so I've never encountered the limitation. I only mentioned it because it seems to be what everyone else recommends.

      This application cannot be run - or rewritten to run - without threads.

      There would be no point to doing so as the bottlenecks would undoubtedly move.

      But, the core of my advise above remains. You will need to profile.

      I tend to do this manually. I find that profiling every line takes too long and produces far too much output to be useful.

      So, I start course-grained at first till I find the points of interest, then more and more fine-grained around those points of interest until I find what I'm looking for.

      So, for a queue consumer thread I will just trace:

      printf STDERR "[%3d] %u %s:%d %s\n", threads->tid, time(), __FILE__, _ +_LINE__, $workitem;

      once for each loop of the queue processing loop.

      That should have negligible affect on overall performance of the code, but should give sufficient information to see if one or more your threads and/or particular work items are significantly slower on the AIX machine that the others.

      Once you get some indication of where the differences lie, you can then disable or remove the irrelevant traces and add a few more in the appropriate places to allow you to zoom in on the cause(s).

      One possibility -- based upon something you said in one of your other replies above --

      are you using virtualisation on one or both of the systems? If so, how are you configuring the vm that is running this app (Ie. what cores/cpus/virtual network adapters/etc)?

      Because historically, if you configure a VM with equal or more of any of those than the actual machine it runs on has, then it can lead to some very bad interactions between the host and guest OS schedulers and lead to diabolically bad performance. Just a thought.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Appreciate the further suggestions. Always good to get more perspectives.

        As it happens however, my code is already littered at thread id level with print STDERR debugging. I turn it on and off with a Getopt::Long + flag where I can choose multiple levels of debugging depending on how many -v's I specify when I start the script, and I can monitor by thread. I wrap calls that way when I need to check what's causing them to be slow, or otherwise debug.

        The issue isn't "what" is causing the problem so much - I already know that in *essence* - it's the multiple calls to REST services that this particular instance of the server has added. That's what's new. What I'm curious about is *why* that's happening. Hence my wondering about the modules and / or context switch throttling.

        There is no virtualization in use. It makes no sense to run any of this virtualized. The Xeon machine is the headnode of a supercomputer cluster. Want things running as fast as possible - while still being written in Perl for a host of other reasons. The AIX machine is a 20-way CPU with 64G of memory. I use as much of it natively also as I need when I need to.

        Kind regards

        Derek