Re: Multithreaded process on AIX slow
by flexvault (Monsignor) on Nov 13, 2013 at 21:20 UTC
|
scunacc,
What version of AIX?
Background: If its AIX 6.1 or later, you may be working against the AIX dispatcher. Unix and Linux treat cores as CPUs, while AIX knows that the first core of the CPU is the fastest and the last core is the slowest. So as an example, let's say you have 8 CPUs with 6 core each. The AIX dispatcher will always want the first core of each CPU working the most and then the next level and so on.
You may want to search on this since I seem to remember that in AIX you may get better performance by limiting the number of active threads, so they execute on the faster cores. Also, I believe I/O bound threads are dispatched on the slower cores and CPU intensive on the faster cores.
I believe the 'xlc' compiler is designed to help C/C++/Fortran programs, but Perl is on it's own. And if your Perl was built with 'gcc' you may be in an even worse situation.
I don't know if this will help, but maybe the information will give you a different approach to the problem.
Regards...Ed
"Well done is better than well said." - Benjamin Franklin
| [reply] |
|
Hi Ed,
Appreciate the observations. Interesting… How does it handle assignments for micropartitioning then since it's virtualized further? I have no control over how that is allotted on this system.
Also - the way the application is designed, I have multiple threads for input and multiple for output. Each either feeds a Q (input) from a client or reads a Q (output) and communicates back to waiting clients. The clients send info. to the server, then sit waiting (as a reverse "server" if you will) for results. The Q's are used to enable the processing threads (of which there are a considerable number in a hierarchy / intercommunicating community performing different related functions) to consume what they want in parallel. When done, they asynchronously dump the results into the output Q's, shared with the output threads. There is no ongoing connection to clients. That is also asynch, the client connection information being carried from input to output Q as part of the SOAP object. The output Q handling threads then contact the client back (acting as a client to the client acting as a server) saying: "Here's the answer".
The I/O that's binding me here isn't that though, since that processing has worked fine with some other in-machine operations at breakneck speed maxing things out nicely when required ;-) since 2008.
The problem seems to be multiple net connections using REST as I mentioned. I can still slam things as fast as they will go with hundreds of clients in sequence if I *batch* my REST data - and - as I say - it completes in nearly the same time as the Xeon-based version then. It's still doing the exact same amount of *other* I/O though. I still start the same # of threads. I still have the same number of clients sending the same amount of data. It's just how much data I then send in each REST request - and, I guess, how many consecutive REST requests I'm making as a result. (1 vs. 50).
So, I/O *per se* isn't binding me. I think what I'm wondering is whether the REST::Client or HTTP::Request modules have any known issues on AIX.
This is AIX 5.3 - can't upgrade this machine. I have a 6.1 machine available that I will have to build an identical Perl on to test with though.
Hmmm. Let's see (…logging in sounds…) I built this particular Perl instance with gcc. :-/ Ah - there was a reason for that - I also had to build postgreSQL on this particular machine and have it dynamic link. Was the only way to get them to play nicely with each other.
Kind regards
Derek
| [reply] |
Re: Multithreaded process on AIX slow
by BrowserUk (Patriarch) on Nov 13, 2013 at 20:32 UTC
|
There is only one way you will make progress on this: profile.
Profile a run on the AIX machine. Profile a run on the xeon machine.
Compare.
If you are unfamiliar with profiling, or want someone to cast a second pair of eyes over the output, and you could send me the block, line & sub .csvs produced (by NYTProf) for the main script plus the sources, I'd willingly take a look.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
Hi, appreciate the suggestion but…
From the docs:
"Devel::NYTProf is not currently thread safe or multiplicity safe. If you'd be interested in helping to fix that then please get in touch with us. Meanwhile, profiling is disabled when a thread is created, and NYTProf tries to ignore any activity from perl interpreters other than the first one that loaded it."
So, - not really any help there I'm afraid. This application cannot be run - or rewritten to run - without threads. By definition it runs with over 100 threads and already takes a lot of shortcuts, e.g. to reduce memory footprint per thread for example like doing dynamic module unloading in each thread after startup - which will confuse profiling. To run the program without threads (even if that were possible which it isn't -would take several months development effort to rewrite), it would then take 10s of hours to run the data.
Kind regards
Derek
| [reply] |
|
Devel::NYTProf is not currently thread safe or multiplicity safe.
Sorry. I don't use NYTProf myself, so I've never encountered the limitation. I only mentioned it because it seems to be what everyone else recommends.
This application cannot be run - or rewritten to run - without threads.
There would be no point to doing so as the bottlenecks would undoubtedly move.
But, the core of my advise above remains. You will need to profile.
I tend to do this manually. I find that profiling every line takes too long and produces far too much output to be useful.
So, I start course-grained at first till I find the points of interest, then more and more fine-grained around those points of interest until I find what I'm looking for.
So, for a queue consumer thread I will just trace:
printf STDERR "[%3d] %u %s:%d %s\n", threads->tid, time(), __FILE__, _
+_LINE__, $workitem;
once for each loop of the queue processing loop.
That should have negligible affect on overall performance of the code, but should give sufficient information to see if one or more your threads and/or particular work items are significantly slower on the AIX machine that the others.
Once you get some indication of where the differences lie, you can then disable or remove the irrelevant traces and add a few more in the appropriate places to allow you to zoom in on the cause(s).
One possibility -- based upon something you said in one of your other replies above -- are you using virtualisation on one or both of the systems? If so, how are you configuring the vm that is running this app (Ie. what cores/cpus/virtual network adapters/etc)?
Because historically, if you configure a VM with equal or more of any of those than the actual machine it runs on has, then it can lead to some very bad interactions between the host and guest OS schedulers and lead to diabolically bad performance. Just a thought.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
|
Re: Multithreaded process on AIX slow
by kschwab (Vicar) on Nov 13, 2013 at 23:23 UTC
|
Does smtctl say that simultaneous multithreading is on? I think it defaults to on, but it could be off. | [reply] |
|
| [reply] |
|
If it's easy to test, try setting these environment variables before you kick it off:
export AIXTHREAD_SCOPE=S
export SPINLOOPTIME=500
export YIELDLOOPTIME=100
export MALLOCMULTIHEAP=1
These are what are recommended for websphere (a java app server), which is, at a high level, doing something similar. All mostly targeted at reducing contention for deadlocks, mutex, etc. | [reply] [d/l] |
|
|
Re: Multithreaded process on AIX slow
by talexb (Chancellor) on Nov 14, 2013 at 13:42 UTC
|
Interesting problem. The application without the REST piece works OK -- and the REST piece works OK, but put them together and the whole thing slows down. That's my understanding of the problem.
I wonder if there's an issue in the middle -- is there a way to limit how many REST requests are in flight? If so, you might see how things go by limiting that number to see if you get better performance. Alternatively, is there some resource that a REST request uses in the application that's in short supply?
Alex / talexb / Toronto
Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.
| [reply] |
|
Hi Alex.
Thanks for the thoughts. Essentially you are correct in your 2nd sentence.
I do have a standalone version of the very simple REST piece I could try hammering away with as fast as possible in parallel I guess. I should do that to eliminate the cross-fertilization with the threading on AIX I suppose, but, it needs to work in this server, otherwise I'll have to use the Linux port instead which is not ideal in this situation but might be workable. (The server communicates with another similar (but-non-REST-call-making) server to store the results from the REST calls in a database among many other tasks, but that second server currently runs on the AIX box where it needs to sit - and does so very smoothly presently. However, they *are* on the same net and it is all interoperable - just that it means more net interactions between the two machines instead of having both servers on the same machine.)
And, yes, if I reduce the # of REST calls by batching as I've mentioned, it works very quickly - quicker even than the Linux version.
The REST resource is not in contention. + I have short-fuse alarm timeout handlers using eval built around the calls so if they do fail it bombs gracefully and moves on to the next call.
Kind regards
Derek
| [reply] |
|
Is the REST code using some kind of semaphore internally? You've been running this same code, with REST, on other machines and it's only slowing-down on one?
| [reply] |
|
Hi
Appreciate the further thought. All good to help the juices flow :-)
Identical code - yes. I guess I need to go look at the actual REST::Client and HTTP::Request CPAN module sources to see what they are doing to answer that.
I know when I developed the surrounding code back in 2008 I discovered a bug/feature in the Perl threading mechanism and variable sharing (reported back on it) that had me going in circles for a while, but led to some interesting workarounds that ended up making the code more efficient as a result. That isn't related to this though. I suppose I'll just have to dig through those net modules and see as well on this. Oh well :-) All good fun.
Kind regards
Derek
| [reply] |