Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run

by 1nickt (Prior)
on Apr 07, 2017 at 12:48 UTC ( #1187394=perlquestion: print w/replies, xml ) Need Help??
1nickt has asked for the wisdom of the Perl Monks concerning the following question:

Learned brethren:

I am using MCE::Shared to create a shared hash for collecting results. I am using MCE::Loop to fork off workers and process a long list of tasks. (Note: I have tried the below code with Parellel::ForkManager instead, with the same results.)

The program works as expected: workers are forked, report their results to the shared hash, and the hash is printed from the END block by the parent process.

However, I would like to be able to interrupt the program and have the hash printed (with manual interrupt, and also on uncaught exception). In a single-process environment this works as expected: the hash is built up and dumped from whatever state it has, in the END block, on interrupt. But in parallel-processing environment, I get unexpected results.

Here is my test program:

use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use Time::HiRes qw/ usleep time /; use MCE::Shared; use MCE::Loop; my $pid = $$; say "PID $pid"; tie my %hash, 'MCE::Shared', (); $SIG{'INT'} = sub { kill 'TERM', -$$ }; $SIG{'TERM'} = sub { exit 0 }; MCE::Loop->init( max_workers => 6, chunk_size => 10 ); mce_loop { say "Forked child with $$"; my ( $mce, $chunk_ref, $chunk_id ) = @_; for ( @{ $chunk_ref } ) { $hash{ sprintf '%.2d %s', $_, $$ } = time; sleep 1; } } ( 0 .. 99 ); MCE::Loop->finish; END { say sprintf '%s %s (%s) in END', $$, time, $$ == $pid ? 'Parent' : + 'Child'; if ( $$ == $pid ) { say 'Parent is ready to dump'; say 'Dumping: ' . Dumper \%hash; } } __END__ <P> Here's an example of the output I am getting from my test program: <c> PID 13209 Forked child with 13211 Forked child with 13212 Forked child with 13215 Forked child with 13214 Forked child with 13213 Forked child with 13216 ^C13211 1491567754.05168 (Child) in END 13216 1491567754.05169 (Child) in END 13213 1491567754.05382 (Child) in END 13212 1491567754.05404 (Child) in END 13214 1491567754.05448 (Child) in END 13209 1491567754.05601 (Parent) in END Parent is ready to dump 13215 1491567754.05627 (Child) in END ^C

Three things strike me as odd about this:

  • Why is a child process surviving longer than the parent with kill 'Term', -$$ ? (Note: this is not always the case. Sometimes the parent dies last; sometimes there are multiple children surviving longer than the parent.)
  • Why is a second interrupt signal needed here? If not given, the program just hangs after printing 'Parent is ready to dump'. (This is always the case.)
  • Why is the line containing the Dumper() statement not executed?

Comparing with a Parallel::ForkManager script that doesn't use a shared variable (still shows child processes surviving longer than the parent, but exits completely with one interrupt signal):

use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use Parallel::ForkManager; my $pid = $$; say "PID $pid"; $SIG{'INT'} = sub { kill 'TERM', -$$ }; $SIG{'TERM'} = sub { exit 0 }; my $pm = Parallel::ForkManager->new(6); for ( 0 .. 9 ) { my $start = 10 * $_; $pm->start and next; for ( $start .. $start + 9 ) { say sprintf '%.2d %s', $_, $$; sleep 1; } $pm->finish; } END { say sprintf '%s (%s) in END', $$, $$ == $pid ? 'Parent' : 'Child'; } __END__
PID 14274 00 14275 10 14276 20 14277 30 14278 40 14279 50 14280 01 14275 11 14276 21 14277 31 14278 41 14279 51 14280 02 14275 12 14276 22 14277 32 14278 42 14279 52 14280 ^C43 14279 33 14278 23 14277 13 14276 03 14275 53 14280 14279 (Child) in END 14280 (Child) in END 14277 (Child) in END 14275 (Child) in END 14276 (Child) in END 14274 (Parent) in END 14278 (Child) in END

I realize this may be more of a fork question than shared data, but the real problem is only manifesting when trying to use the shared data structure. Thanks for any pointers.


The way forward always starts with a minimal test.
  • Comment on Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
  • Select or Download Code

Replies are listed 'Best First'.
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:28 UTC
    # $SIG{'INT'} = sub { kill 'TERM', -$$ }; # this works already $SIG{'TERM'} = sub { MCE->exit(0) }; # Notifies the parent

    This is another possibility.

    $SIG{'TERM'} = sub { if (MCE->wid > 0) { # worker MCE->exit(0); } else { # parent MCE::Signal::stop_and_exit('TERM'); } };

      Thank you for the reply ( marioroy? )!

      In the first example, it was necessary to move the SIG handlers below the mce_loop statement, else it gave error:

      MCE::exit: method is not allowed by the manager process at mce.pl line + 19.

      After moving the SIG handlers, it gives no error but does not appear to execute any of the code in END:

      PID 18566 Forked child with 18568 Forked child with 18569 Forked child with 18571 Forked child with 18572 Forked child with 18570 Forked child with 18573 ^C ## mce.pl: caught signal (INT), exiting Killed
      The second example you provided gives the same result:
      PID 18874 Forked child with 18877 Forked child with 18876 Forked child with 18879 Forked child with 18878 Forked child with 18880 Forked child with 18881 ^C ## mce3.pl: caught signal (INT), exiting Killed

      Note that the program runs successfully if left to complete:

      PID 19017 Forked child with 19019 Forked child with 19020 Forked child with 19023 Forked child with 19022 Forked child with 19024 Forked child with 19021 Forked child with 19019 Forked child with 19020 Forked child with 19022 Forked child with 19023 19020 1491572765.74874 (Child) in END 19019 1491572765.75111 (Child) in END 19024 1491572765.75325 (Child) in END 19022 1491572765.7533 (Child) in END 19021 1491572765.7552 (Child) in END 19023 1491572765.75711 (Child) in END 19017 1491572765.75939 (Parent) in END Parent is ready to dump Dumping: $VAR1 = { '00 19019' => '1491572745.74377', '01 19019' => '1491572746.74401', '02 19019' => '1491572747.74421', '03 19019' => '1491572748.74441', '04 19019' => '1491572749.74457', '05 19019' => '1491572750.74473', '06 19019' => '1491572751.74493', '07 19019' => '1491572752.7451', '08 19019' => '1491572753.74529', '09 19019' => '1491572754.7455', '10 19020' => '1491572745.7436', '11 19020' => '1491572746.74385', '12 19020' => '1491572747.74405', '13 19020' => '1491572748.74427', ... }

      Thank you again for the help.


      The way forward always starts with a minimal test.

      $SIG{INT} points to MCE::Signal::stop_and_exit by MCE::Signal. Ditto for $SIG{TERM}. Folks can override that if need be like you've done. In that case, the following is needed.

      $SIG{'TERM'} = sub { if (MCE->wid > 0) { # worker MCE->exit(0); } else { # parent MCE::Signal::stop_and_exit('TERM'); } };

      MCE->exit is important, before the worker exits. The manager process substracts the count by one and reaps the worker.

Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:40 UTC

    Oh yes, one might want to display the shared data. This will do.

    $SIG{'TERM'} = sub { if (MCE->wid > 0) { # worker MCE->exit(0); } else { # parent say 'Parent is ready to dump'; say 'Dumping: ' . Dumper \%hash; MCE::Signal::stop_and_exit('TERM'); } };
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 13:46 UTC

    For the INT handler, one can do this. I understand now. You want to display the data already stored in the shared cache.

    $SIG{'INT'} = sub { if ( tied(%hash)->len ) { (MCE->wid == 0) ? say 'Parent is ready to dump' : say 'Worker is ready to dump'; say 'Dumping: ' . Dumper \%hash; %hash = (); } MCE::Signal::stop_and_exit('INT'); }; $SIG{'TERM'} = sub { if (MCE->wid > 0) { # worker MCE->exit(0); } else { # parent say 'Parent is ready to dump'; say 'Dumping: ' . Dumper \%hash; MCE::Signal::stop_and_exit('TERM'); } };

      Here is what I have after implementing the above suggestion. I have a couple of questions:

      • Why, when chunk_size is set to `1`, is the same process reused ?
      • Why, when the interrupt is given, is nothing inside the SIG handlers being executed?
      • What is the purpose of emptying the hash in the INT signal handler?

      Full SSCCE:

      use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use MCE::Shared; use MCE::Loop; $|++; my $pid = $$; say "PID $pid"; tie my %hash, 'MCE::Shared', (); MCE::Loop->init( max_workers => 2, chunk_size => 1 ); mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M +CE->wid, MCE->pid, $chunk_id; for ( @{ $chunk_ref } ) { $hash{ sprintf '%.2d %s', $_, $$ } = time; say "After $_: " . Dumper \%hash; sleep 1; } } ( 0 .. 4 ); MCE::Loop->finish; $SIG{'INT'} = sub { say 'Hello from INT'; if ( tied(%hash)->len ) { (MCE->wid == 0) ? say 'Parent is ready to dump' : say 'Worker is ready to dump'; say 'Dumping: ' . Dumper \%hash; %hash = (); } MCE::Signal::stop_and_exit('INT'); }; $SIG{'TERM'} = sub { say 'Hello from TERM'; if (MCE->wid > 0) { # worker MCE->exit(0); } else { # parent say 'Parent is ready to dump'; say 'Dumping: ' . Dumper \%hash; MCE::Signal::stop_and_exit('TERM'); } }; END { say sprintf '%s %s (%s) in END', $$, time, $$ == $pid ? 'Parent' : + 'Child'; if ( MCE->wid == 0 or $$ == $pid ) { say "Parent is ready to dump"; say 'Dumping: ' . Dumper \%hash; } } __END__
      Output when interrupted:
      PID 21106 Forked worker in slot 2 with pid 21110 for chunk 1 Forked worker in slot 1 with pid 21109 for chunk 2 After 0: $VAR1 = { '00 21110' => '1491574316', '01 21109' => '1491574316' }; After 1: $VAR1 = { '00 21110' => '1491574316', '01 21109' => '1491574316' }; ^C ## mce3.pl: caught signal (INT), exiting Killed
      Output without interrupt:
      PID 20939 Forked worker in slot 2 with pid 20942 for chunk 1 Forked worker in slot 1 with pid 20941 for chunk 2 After 0: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178' }; After 1: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178' }; Forked worker in slot 2 with pid 20942 for chunk 3 Forked worker in slot 1 with pid 20941 for chunk 4 After 2: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178', '02 20942' => '1491574179', '03 20941' => '1491574179' }; After 3: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178', '02 20942' => '1491574179', '03 20941' => '1491574179' }; Forked worker in slot 2 with pid 20942 for chunk 5 After 4: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178', '02 20942' => '1491574179', '03 20941' => '1491574179', '04 20942' => '1491574180' }; 20941 1491574181 (Child) in END 20942 1491574181 (Child) in END 20939 1491574181 (Parent) in END Parent is ready to dump Dumping: $VAR1 = { '00 20942' => '1491574178', '01 20941' => '1491574178', '02 20942' => '1491574179', '03 20941' => '1491574179', '04 20942' => '1491574180' };

      Thanks again for your help.


      The way forward always starts with a minimal test.

        Why, when chunk_size is set to `1`, is the same process reused?

        Workers persist from start to finish. chunk_size refers to how many items a given worker receives per user_func.

        user_begin user_func user_func user_func ... user_func user_end

        Why, when the interrupt is given, is nothing inside the SIG handlers being executed?

        The overriding of SIG handlers must be done before calling mce_loop or before workers are spawned.

        What is the purpose of emptying the hash in the INT signal handler?

        The worker or parent receiving the signal displays the content and subsequently clears the hash before notifying others to exit. Thus, causing other workers to call the handler. We only need to display the content once.

Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 18:13 UTC

    Okay, I see the issue. The shared-manager process is exiting from receiving the signal. Thus, not able to respond to later requests. Hence stalling the script.

    Inside MCE::Shared::Server.pm around line 461, comment out the handler code and add the subsequent line. Basically, the shared-manager must still respond to requests inside application handlers. I'm not sure if I can handle both cases. This handler was placed here in the event something killed the shared-manager process.

    sub _loop { $_is_client = 0; # $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub { # $SIG{INT} = $SIG{$_[0]} = sub { }; # # CORE::kill($_[0], $_is_MSWin32 ? -$$ : -getpgrp); # for my $_i (1..15) { sleep 0.060 } # # CORE::kill('KILL', $$); # CORE::exit(255); # }; $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub { }; ... }

    I've simplyfied the handling for the MCE script.

    use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use Time::HiRes 'sleep'; use MCE::Loop; use MCE::Shared; $|++; my $pid = $$; say "PID $pid"; my $hash = MCE::Shared->hash(); $SIG{'INT'} = $SIG{'TERM'} = sub { my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {}; say "Hello from $signal: $$"; say 'Parent is ready to dump'; say 'Dumping: ' . Dumper $hash->export; MCE::Signal::stop_and_exit('INT'); }; MCE::Loop->init( max_workers => 2, chunk_size => 1, user_begin => sub { $SIG{'INT'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE->exit(0); }; $SIG{'TERM'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE::Signal::stop_and_exit($signal); }; } ); mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M +CE->wid, MCE->pid, $chunk_id; for ( @{ $chunk_ref } ) { $hash->{ sprintf '%.2d %s', $_, $$ } = time; say "After $_: " . Dumper $hash->export; sleep 2; } } ( 0 .. 12 ); MCE::Loop->finish; say "Parent is ready to dump"; say 'Dumping: ' . Dumper $hash->export;

      Ah, upon closer inspection, the reason the END block isn't called is from the KILL line.

      # $SIG{HUP} = $SIG{INT} = $SIG{QUIT} = $SIG{TERM} = sub { # $SIG{INT} = $SIG{$_[0]} = sub { }; # # CORE::kill($_[0], $_is_MSWin32 ? -$$ : -getpgrp); # for my $_i (1..15) { sleep 0.060 } # # CORE::kill('KILL', $$); # CORE::exit(255); # };

      There is no guarantee to which END block is called first by Perl. MCE::Shared has an END block to notify the shared-manager to exit. The script will stall had Perl called that one first. Therefore, leave intact the sig handling bits at the application level. The END block is not necessary. But simply added to see workers enter it.

      use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use Time::HiRes 'sleep'; use MCE::Loop; use MCE::Shared; $|++; my $pid = $$; say "PID $pid"; my $hash = MCE::Shared->hash(); $SIG{'INT'} = $SIG{'TERM'} = sub { my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {}; say "Hello from $signal: $$"; say 'Parent is ready to dump'; say 'Dumping: ' . Dumper $hash->export; MCE::Signal::stop_and_exit('INT'); }; MCE::Loop->init( max_workers => 2, chunk_size => 1, user_begin => sub { $SIG{'INT'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE->exit(0); }; $SIG{'TERM'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE::Signal::stop_and_exit($signal); }; } ); mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; say sprintf 'Forked worker in slot %s with pid %s for chunk %s', M +CE->wid, MCE->pid, $chunk_id; for ( @{ $chunk_ref } ) { $hash->{ sprintf '%.2d %s', $_, $$ } = time; say "After $_: " . Dumper $hash->export; sleep 3; } } ( 0 .. 12 ); MCE::Loop->finish; END { say "Hello from END block: $$"; if ($$ == $pid) { say "Parent is ready to dump"; say 'Dumping: ' . Dumper $hash->export; } }

      I will make a new MCE::Shared update after more testing. Thank you, 1nickt.

        Hello Mario,

        (BTW you mentioned that having the signal handlers as well as END may be overdone ... the reason I have the data dumped in END is so it prints out upon uncaught exception.)

        I made the change to MCE::Shared::Server as instructed. I took your last script (in the post I am replying to) and modified slightly. First, took some of the debug statements away and cleaned up others. Second, only dump the data in INT from parent process (as you have it in END). This gives following results:

        With CTL-C, the END block is never reached by the parent; only the children. But the parent dumps the data in INT.

        perl mce9.pl Parent PID 11581 worker 2 (11584) processing chunk 1 worker 1 (11583) processing chunk 2 worker 1 (11583) processing chunk 4 worker 2 (11584) processing chunk 3 ^CHello from INT: 11584 Hello from INT: 11581 Hello from END block: 11584 Parent in INT: $VAR1 = bless( { '00 11584' => '1491592596', '01 11583' => '1491592596', '02 11584' => '1491592598', '03 11583' => '1491592598' }, 'MCE::Shared::Hash' ); ## mce9.pl: caught signal (INT), exiting Hello from INT: 11583 Hello from END block: 11583 Killed
        When running to completion, the parent reaches the END block and dumps the data there:
        perl mce9.pl Parent PID 11554 worker 2 (11557) processing chunk 1 worker 1 (11556) processing chunk 2 worker 2 (11557) processing chunk 3 worker 1 (11556) processing chunk 4 worker 1 (11556) processing chunk 6 worker 2 (11557) processing chunk 5 worker 1 (11556) processing chunk 7 Hello from END block: 11557 Hello from END block: 11556 Hello from END block: 11554 Parent in END: $VAR1 = bless( { '00 11557' => '1491592580', '01 11556' => '1491592580', '02 11557' => '1491592582', '03 11556' => '1491592582', '04 11557' => '1491592584', '05 11556' => '1491592584', '06 11556' => '1491592586' }, 'MCE::Shared::Hash' );
        Regarding END blocks, I believe Perl does guarantee the order (unlike BEGIN), which is Last In First Out. So the END block in the script should be executed before the one in a module that is loaded by use.

        Code now:

        use strict; use warnings; use feature 'say'; use Data::Dumper; ++$Data::Dumper::Sortkeys; use MCE::Loop; use MCE::Shared; $|++; my $pid = $$; say "Parent PID $pid"; my $hash = MCE::Shared->hash(); $SIG{'INT'} = $SIG{'TERM'} = sub { my $signal = shift; $SIG{'INT'} = $SIG{'TERM'} = sub {}; say "Hello from $signal: $$"; if ( $$ == $pid ) { say 'Parent in INT: ' . Dumper $hash->export; } MCE::Signal::stop_and_exit('INT'); }; MCE::Loop->init( max_workers => 2, chunk_size => 1, user_begin => sub { $SIG{'INT'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE->exit(0); }; $SIG{'TERM'} = sub { my $signal = shift; say "Hello from $signal: $$"; MCE::Signal::stop_and_exit($signal); }; } ); mce_loop { my ( $mce, $chunk_ref, $chunk_id ) = @_; say sprintf 'worker %s (%s) processing chunk %s', MCE->wid, MCE->p +id, $chunk_id; for ( @{ $chunk_ref } ) { $hash->{ sprintf '%.2d %s', $_, $$ } = time; sleep 2; } } ( 0 .. 6 ); MCE::Loop->finish; END { say "Hello from END block: $$"; if ( $$ == $pid ) { say 'Parent in END: ' . Dumper $hash->export; } }

        Thank you again.


        The way forward always starts with a minimal test.
Re: Interrupt multi-process program while using MCE::Shared hash: END block code does not (all) run
by Anonymous Monk on Apr 07, 2017 at 20:48 UTC
    Try the Perl 6 concurrency model, it is much more powerful. ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1187394]
Approved by marto
Front-paged by marto
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2017-12-16 13:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (453 votes). Check out past polls.

    Notices?