Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Design advice: Classic boss/worker program memory consumption

by Laurent_R (Parson)
on May 20, 2014 at 21:30 UTC ( #1086864=note: print w/ replies, xml ) Need Help??


in reply to Re: Design advice: Classic boss/worker program memory consumption
in thread Design advice: Classic boss/worker program memory consumption

The memory size is only part of the problem. The problem with spawning too many children also has to do with context switches.

I am regularly working with seven separate databases, each having eight sub-databases (and huge amount of data). So, in principle, I could launch 56 parallel processes for extracting data from these 56 data sources. Most of the time, I don't use forks or threads, but simply launch parallel background processes under the shell. And the processes that I am referring to are sometimes in Perl, and sometimes in a variety of other languages (including a proprietary equivalent of PL/SQL), so that using the OS to fork background processes is often the easiest route: the shell and the OS manage the parallel tasks, the program manages the functional/business requirements.

Our servers have usually 8 CPUs. We have been doing this for many years and have tried a number of options and configurations, and we have found that the optimal number of processes running in parallel is usually between 8 and 16, depending on the specifics of the individual programs (i.e. some are faster with 8 processes, some better with 12 and some better with 16, depending on what exactly they are doing and how).

If we use less than 8 processes, we have an obvious under-utilization of the hardware; if we let 56 processes run in parallel, the overall process takes much longer to execute than when we have 8 to 16 processes, and we strongly believe that this is due to context switches and memory usage. In some cases (when the extraction need heavy DB sorting, for example), the overall process even fails for lack of memory if we have too many processes running in parallel. But in most of the cases, it really seems to be linked to having too many processes running for the number CPUs available, leading to intensive context switches.

So what we are doing is to put the 56 processes into a waiting queue, whose role is just to to keep optimal the actual number of processes running in parallel (8 to 16 depending on the program). This is at least the best solution we've found so far. With about 15 years multiplied by 5 or 6 persons of cumulated experience on the subject. Now, if anyone has a better idea, I would gladly take it up and try it out.


Comment on Re^2: Design advice: Classic boss/worker program memory consumption
Re^3: Design advice: Classic boss/worker program memory consumption (context switching is cheap)
by tye (Cardinal) on May 21, 2014 at 00:55 UTC
    if we let 56 processes run in parallel, the overall process takes much longer to execute than when we have 8 to 16 processes, and we strongly believe that this is due to context switches and memory usage.

    Having to swap pages out and back in surely can make things run much, much longer. 56 processes talking to databases isn't really going to add enough overhead from context switching to cause the total run-time to be "much longer", IME. To get context switching to be even a moderate source of overhead, you have to force it to happen extra often by passing tiny messages back and forth way more frequently than your OS's time slice.

    There are also other resources that can have a significant impact on performance if you try to over-parallelize. Different forms of caching (file system, database, L1, L2) can become much less effective if you have too many simultaneously competing uses for them (having to swap out/in pages of VM is just a specific instance of this general problem).

    To a lesser extent, you can also lose performance for disk I/O by having reads be done in a less-sequential pattern. But I bet that is mostly "in the noise" compared to reduction in cache efficiency leading to things having to be read more than once each.

    Poor lock granularity can also lead to greatly reduced performance when you over-parallelize, but I doubt that applies in your situation.

    - tye        

      context switching is cheap

      Sorry Tye, but that is twaddle. Eiher guesswork, or naiveté.

      Firstly, which type of "context switch" are you mis-describing?

      • Thread-context.
      • Process-context.

      The cost of a hardware-based, thread-only context switch runs from 10's to over 1000 microseconds. Add in the need to invalidate and refill L1/L2/L3 caches, and the total time to re-consitute a thread-context back the point that real work can proceed can actually take longer than the maximum timeslice some (flavours of some) OSs will allocate to a thread.

      And don't go blaming threads; switching process-contexts is a (collection of) software operations and is more expensive still.

      The cost of context switch may come from several aspects. The processor registers need to be saved and restored, the OS kernel code (scheduler) must execute, the TLB entries need to be reloaded, and processor pipeline must be flushed. These costs are directly associated with almost every context switch in a multitasking system. These are the direct costs.

      In addition, context switch leads to cache sharing between multiple processes, which may result in performance degradation. This cost varies for different workloads with different memory access behaviors and for different architectures. These are cache interference costs or indirect costs of a context switch


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks for mostly agreeing with me (despite feeling like you completely disagreed with me).

        - tye        

      Having to swap pages out and back in surely can make things run much, much longer. 56 processes talking to databases isn't really going to add enough overhead from context switching to cause the total run-time to be "much longer", IME.

      Well I guess it depends on our definitions of what "much longer" mean. In our case, it can be up to twice longer, say four hours instead of 2. For us it is sometimes the difference between usable and not usable program.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1086864]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2014-11-24 00:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (134 votes), past polls