Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Perl threads loss of performance with system call

by daniel85 (Novice)
on Jul 09, 2021 at 22:02 UTC ( #11134875=perlquestion: print w/replies, xml ) Need Help??

daniel85 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have encountered a big issue with perl threads, and I truly hope someone can help me to solve it. In brief: in certain situations, if you spawn threads, and within each thread you execute a shell command (either via system() or backticks), you see a significant loss of performances. Most importantly, if you check with "top", you will immediately see that not all threads are actually executing the system call simultaneously, but at most 1 or 2 of them. Here what I noticed so far by making a few tests:

1. The behaviour is triggered when the shell command is executed "very quickly" (a few milliseconds)
2. If from within the thread you fork a process, and execute the system call in there, this behaviour is not observed

So, it seems like all threads are competing for some resource. Any idea on how this can be prevented/fixed?

Many thanks to all of you in advance for your advice!

Daniel

# runs smoothly use threads; use strict; my @pool = map { threads->create(\&mysub) } 1 .. 50; $_->join for (@pool); sub mysub { print "spawned " . threads->tid . "\n"; for (1 .. 100) { `sleep 1`; } } __END__ # after the first threads have been spawned, it starts slowing down use threads; use strict; my @pool = map { threads->create(\&mysub) } 1 .. 50; $_->join for (@pool); sub mysub { print "spawned " . threads->tid . "\n"; for (1 .. 100) { `sleep 0.0001`; } } __END__

Replies are listed 'Best First'.
Re: Perl threads loss of performance with system call
by dave_the_m (Monsignor) on Jul 10, 2021 at 08:58 UTC
    I can't reproduce the issue here (linux, 12 core x86_64; I tried various perls from 5.10.1 to 5.34.0).

    With sleep 1, I get a total running time of the program of 1m 40s (consistent with each thread in parallel doing 100 1s sleeps), and with sleep 0.0001, I get a total running time of 2.5s, which seems reasonable for all the overhead of setting up and tearing down 50 threads plus the overhead of doing 50000 calls to system().

    Dave.

      I was yesterday on the #perl-help channel on irc.perl.org and 2-3 people tried and observed the same behaviour. When you run the first one (sleep 1), all the threads are spawned nearly instantaneously, while with the second (sleep 0.0001), you see considerable delay in the spawning.

        and what OS did they run the code in?

Re: Perl threads loss of performance with system call
by bliako (Monsignor) on Jul 12, 2021 at 12:42 UTC

    I run above code on fedora and I didn't observe any problem, but then again I don't know what is proper! If you provide a Short, Self-Contained, Correct Example which makes it clear to distinguish between correct and incorrect results I will give it a try.

    You should be aware that there are quite a few ways of running a system command via Perl. And some of them involve running it via a shell, and some don't. Also a system command involves a fork. An exec does not, it replaces the current process with the system command. And of course there are various higher-level ways of spawning a system command like IPC::Run. Do you see the problem with all of them? or some of them? Have you also tried to run parallel sleep() from within the shell, avoiding perl to see if that's a perl problem?

    bw, bliako

    Update:

    for long sleep: real 1m40.444s user 0m1.271s sys 0m8.883s for short sleep: real 0m1.901s user 0m1.442s sys 0m6.836s

    Update2: use dmesg and check log files to see if the OS complains that certain quota on resources have been exceeded and it can't allocate more (perhaps too many open filehandles? a backticks-system will return back the STDOUT of the command). Use strace -fp PID to see exactly what's going on if you can seave through the maze of output.). Perhaps relevant: GNU-Parallel is an *excellent* tool for spawning commands in parallel and it is written in Perl.

Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 12, 2021 at 12:43 UTC
      What "a pure perl solution" would be? Cause the program I have to call is a program that performs a very specific analysis, and I cannot just rewrite it in Perl. Also, if you take my code, and replace the backticks with qx or system, the result is always the same.

        I guess you are a little bit resistant to advice. «Beratungsresistent» as we say in German. Please provide more details. Else I abandon all hope. Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 10, 2021 at 08:11 UTC
      Hi karlgoethebier, unfortunately yes. And it runs smoothly with fork(), but I want to possibly handle it with threads.
        «I want to possibly handle it with threads»

        Probably you should consider to use MCE. It forks but you can say use threads; and it works as designed.

        «The Crux of the Biscuit is the Apostrophe»

Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 10, 2021 at 08:58 UTC
    «… within each thread you execute a shell command…»

    BTW, I’m not so sure if this is a good idea. Probably there might be a solution in pure Perl.

    «The Crux of the Biscuit is the Apostrophe»

Re: Perl threads loss of performance with system call
by bliako (Monsignor) on Jul 13, 2021 at 23:19 UTC

    I have observed that the terms "short" and "long" sleep-times, as you put it, depend on the number of threads (=time to fill the @pool=when to join). For example in my system, a sleep of 0.7s will display delays but only when the number of threads is greater than 300. While a sleep of 1s does show these problems but for 500+ threads. This code my @pool = map { threads->create(\&mysub) } 1 .. 50; creates a thread, runs it and then adds it to the @pool and repeats. The important thing to notice is that the thread starts execution immediately while others are still being created. And when thread execution involves system (=fork+exec+waitpid) which has to exit very shortly, somewhere there is a race, given that threads and fork involve replicate program state. Now this is a rough sketch and probably silly, I am very far from being an expert on Perl gastroenterology.

    I have a possible workaround, which is lame in that it will probably take longer to finish ... Just make the created threads to wait for the pool to fill up and then do the actual work (system command). That can be achieved with a shared flag like below:

    use threads; use threads::shared; use strict; use Time::HiRes qw/clock_gettime CLOCK_REALTIME usleep/; my $flag:shared = 1; print "starting at ".clock_gettime(CLOCK_REALTIME)."\n"; my @pool; for(1..50){ push @pool, threads->create(\&mysub); } print "giving the flag at ".clock_gettime(CLOCK_REALTIME)."\n"; $flag = 0; # all threads go! $_->join for (@pool); sub mysub { my $tid = threads->tid; while( $flag ){ print "thread $tid created and waiting for the flag at ".clock_get +time(CLOCK_REALTIME)."\n"; usleep(1000); } #my $x = 0.0001 + rand(0.0015); my $x = 0.001; #my $x = rand() >= 0.5 ? 0.001 : 1; print "thread $tid is working with sleep=$x, at ".clock_gettime(CL +OCK_REALTIME)."\n"; my $id; for my $i (1..100){ $id = "i=$i, tid=$tid"; # print ` ` echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut - +d' ' -f39\`": "; date +%s.%N sleep $x echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut - +d' ' -f39\`": "; date +%s.%N `; } print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM +E)."\n"; } __END__

    An explanation of what's going on should probably involve the fact that system involves fork, exec and waitpid (see Learning Perl, ch 14). And that both threads and fork replicate the program state. And choroba's citing the doc here:

    Thinking of mixing fork() and threads? Please lie down and wait until the feeling passes. Be aware that the semantics of fork() vary between platforms. For example, some Unix systems copy all the current threads into the child process, while others only copy the thread that called fork(). You have been warned!
    

    bw, bliako

      This is what I actually tried to do at the very beginning. However, if you keep an eye on your "top", you will see that at most 2-3 threads are running concurrently at any time, while most of them are inactive. Indeed, the total execution time is longer with "sleep 0.001" (or echo A), than it is with "sleep 1".

        i use htop. I did not see what you describe. All cores were active. Still we are talking about very small time intervals. In any event you now have some tools to debug this and confirm any suspicions you may have. For example taskset will report on the current cpu (relocations are common though).

        bw, bliako

Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 11, 2021 at 16:13 UTC

    How does this perform on your box?

    «The Crux of the Biscuit is the Apostrophe»

      Please find 2 examples, based on karlgoethebier's demonstration.

      Example 1:

      The posix_exit Hobo option is beneficial on Unix platforms. Be sure to close any file handles opened by the worker. The void_context option is helpful to reduce IPC overhead if not wanting result from workers.

      #!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); MCE::Hobo->init( max_workers => $cores, void_context => 1, posix_exit => 1, ); sub consumer_task { my $id = shift; say qq($id ) . MCE::Hobo->pid(); qx( sleep @{[ AMOUNT ]} ); } for my $id ( 1..800 ) { my $hobo = MCE::Hobo->create( \&consumer_task, $id ); } MCE::Hobo->wait_all();

      Example 2:

      Here, we minimize the spawning of Hobo workers and instead use a channel or a queue to send/receive input.

      #!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use MCE::Channel; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); my $queue = MCE::Channel->new(); MCE::Hobo->init( void_context => 1, posix_exit => 1, ); # Consumers sub consumer_task { while ( my $input = $queue->dequeue() ) { say qq($input->{id} ) . MCE::Hobo->pid(); qx( sleep @{[ AMOUNT ]};) } } MCE::Hobo->create( \&consumer_task ) for 1..$cores; # Producer for ( 1..800 ) { $queue->enqueue({ id => $_ }); } $queue->end(); MCE::Hobo->wait_all();

        The channel object is bidirectional. Therefore, the worker can send its PID value to the producer. This has the effect of max one item in the channel. The producer writes to STDOUT like the demonstration by karlgoethebier.

        #!/usr/bin/env perl use strict; use warnings; use MCE::Hobo; use MCE::Channel; use feature qw(say); use constant AMOUNT => 0.001; my $cores = MCE::Util::get_ncpu(); my $queue = MCE::Channel->new(); MCE::Hobo->init( void_context => 1, posix_exit => 1, ); # Consumers sub consumer_task { while ( my $input = $queue->recv() ) { my $id = $input->{id}; $queue->send2( MCE::Hobo->pid() ); qx( sleep @{[ AMOUNT ]};) } } MCE::Hobo->create( \&consumer_task ) for 1..$cores; # Producer for my $id ( 1..800 ) { $queue->send({ id => $id }); my $hobo_pid = $queue->recv2(); say qq($id ) . $hobo_pid; } $queue->end(); MCE::Hobo->wait_all();

        Thanks. This looks good. What I never understood is why to call  MCE::Hobo->wait_all(); in this context. As it seems that all the hobos terminate correctly in this fubar loop without the call.

        «The Crux of the Biscuit is the Apostrophe»

      Quite smoothly, but it exits before everything is terminated

        Strange. On my Mac it works as designed. 50 threads and as I have 12 cores 4 x 12 plus 2. It can be seen in the output as well as in the process table. For the moment I’m at a loss.

        «The Crux of the Biscuit is the Apostrophe»

        Uncommenting the wait_all line is necessary to ensure workers have completed and exited.

        MCE::Hobo->wait_all();
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 13, 2021 at 12:05 UTC

    You may consider this instead of your mysub:

    use IPC::Run qw( run ); use feature qw(say); say alien( q(echo), q(fubar) ); sub alien { my ($command, $arg) = @_; run qq($command $arg); }

    As also recommended by bliako.

    «The Crux of the Biscuit is the Apostrophe»

        I think I see the same effect on fedora! But that may be due to terminal's sequential stdout!

        Here is a script which can help you debug this further, it prints start and stop time of shell system command and which cpu the shell was at the beginning and at the end (or at least that's what I believe as I am not into the Linux scheduler at all). I think it is better to rely on these numbers rather than on the frequency you see "spawned 10" on the terminal, something very unreliable! You can process these timings and see for yourself if there is delay from creating to sleeping. I don't see it, e.g. the 1st shell-command is run at 1626196220.813910751 which is, time-wise, near spawned 3 at 1626196220.81309 I also observe that spawned 7 at 1626196220.82161 and start(i=1, tid=7/449711) on cpu 0: 1626196220.828573978 are very near as well.

        use threads; use strict; use Time::HiRes qw/clock_gettime CLOCK_REALTIME/; # use this to control which CPU you assign this script # (and in my system also the shell system command) #use Sys::CpuAffinity; #my $success = Sys::CpuAffinity::setAffinity($$, [1]); print "starting at ".clock_gettime(CLOCK_REALTIME)."\n"; my @pool; for(1..50){ push @pool, threads->create(\&mysub); } $_->join for (@pool); sub mysub { print "spawned " . threads->tid . " at ".clock_gettime(CLOCK_REALT +IME)."\n"; #for(1..100){ select(undef, undef, undef, 0.001); } #my $x = 0.0001 + rand(0.0015); my $x = 0.001; my $id; for my $i (1..100){ $id = "i=$i, tid=".threads->tid; print ` echo -n "start($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut - +d' ' -f39\`": "; date +%s.%N sleep $x echo -n " stop($id/\$\$) on cpu "\`cat /proc/\$\$/stat | cut - +d' ' -f39\`": "; date +%s.%N `; } print "ended " . threads->tid . " at ".clock_gettime(CLOCK_REALTIM +E)."\n"; } __END__ starting at 1626196220.80981 spawned 1 at 1626196220.81077 spawned 2 at 1626196220.81192 spawned 3 at 1626196220.81309 spawned 4 at 1626196220.81466 spawned 5 at 1626196220.81614 spawned 6 at 1626196220.818 start(i=1, tid=1/449647) on cpu 3: 1626196220.813910751 stop(i=1, tid=1/449647) on cpu 3: 1626196220.818011634 start(i=1, tid=2/449649) on cpu 1: 1626196220.815133617 stop(i=1, tid=2/449649) on cpu 3: 1626196220.819228058 start(i=1, tid=3/449654) on cpu 4: 1626196220.816404247 stop(i=1, tid=3/449654) on cpu 2: 1626196220.820722313 spawned 7 at 1626196220.82161 spawned 8 at 1626196220.8235 start(i=1, tid=4/449665) on cpu 6: 1626196220.818088441 stop(i=1, tid=4/449665) on cpu 4: 1626196220.825072117 spawned 9 at 1626196220.82603 spawned 10 at 1626196220.82844 start(i=1, tid=5/449675) on cpu 4: 1626196220.820177902 start(i=1, tid=6/449684) on cpu 4: 1626196220.824213073 ...

        bw, bliako

        Thanks. But i don't believe so much in animated gifs but in Benchmark. And in the recording i see your code and not mine.

        And probably we should better forget this strange loop with sleep and echo and talk about your real command.

        And BTW, from my sleep manpage:

        "The sleep command will accept and honor a non-integer number of specified seconds (with a `.' character as a decimal point). This is a non-portable extension, and its use will nearly guarantee that a shell script will not execute properly on another system."

        Holy Joe!

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re: Perl threads loss of performance with system call
by perlfan (Vicar) on Jul 10, 2021 at 05:33 UTC
    > why?

    Because Perl doesn't have real lightweight, shared memory threads like you post is making me think might think (through perhaps no fault of your own).

    > no, really, why?

    You haven't mentioned anything about the computing environment - virtual, bare metal, one CPU, many vCPUs, etc? Can you replicate this via bash backgrounding child processes?

      Hi perlfan, the computer is a Fedora linux machine with 56 cores.
        Trust me, if you want to do real threading, perl is the wrong tool. If you're cool with fork, it is. Check out MCE as suggested, or my personal hammer, Parallel::ForkManager. PDL::ParallelCPU looks interesting, but I have not tried it out.

        That said, if you want to feel "perlish" yet have real SMP threading, checkout out Qore.

Re: Perl threads loss of performance with system call
by jflopezfernandez (Novice) on Jul 10, 2021 at 17:36 UTC

    I wonder if the problem is that when you call sleep;3 with an argument of one second, the processor considers this long enough to switch context and continue with the next threads, which is not the case when the argument is only one ten-thousandth of a second. This is exactly what happens when the amount of time to sleep for is less than the clock resolution.

    The sleep functions (meaning both the C library and system functions, on both Linux and FreeBSD) all require an integral argument as their sole calling parameter, so I think what's happening is that the argument is being coerced into a zero, the thread is calling sleep with an argument of zero, and thus the process is sequentially calling sleep two threads at a time, as that is its upper limit on an Intel processor with hyperthreading.

    Most decidedly not a Perl guru
      All the shells I have installed (bash, dash, tcsh) support floating point arguments to sleep and they really try to sleep for the specified non-integer time. (See for example lib/sh/ufuncs.c in bash).

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks for your reply! The "sleep" was just an example. You can observe the same behaviour with any other command that gets quickly executed (for example, "echo A" will mimic the same behaviour).
Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 15, 2021 at 06:48 UTC
    «… mixing fork() and threads…»

    I guess bliako is right. I had always a bad feeling with this idea. But I have another idea: You are on Fedora, right? Why don’t you simply fork? You can do this «by hand» or use MCE or Parallel::ForkManager. You know this. And as we now know your real command it would be also nice to see your real code. Else debugging is really hard and just a guessing game. One more thing: I don’t know anything about the lib(s) you use. Do you really need the full «framework»? And is it possible for you to port something you need from C to Perl? To get the idea you might take a look at List::MoreUtils::PP and compare it with List::MoreUtils. Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: Perl threads loss of performance with system call
by karlgoethebier (Abbot) on Jul 16, 2021 at 06:41 UTC
Re: Perl threads loss of performance with system call
by Anonymous Monk on Jul 10, 2021 at 03:48 UTC
    Why are you joining threads? Control the stack size?

      It waits for them to finish.

      Seeking work! You can reach me at ikegami@adaelis.com

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11134875]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2022-05-17 04:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (65 votes). Check out past polls.

    Notices?