Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Module for transparently forking a sub?

by kyle (Abbot)
on Feb 13, 2009 at 16:36 UTC ( [id://743634]=perlquestion: print w/replies, xml ) Need Help??

kyle has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking this morning that it would be nice to have a way to run some sub in a forked process and get its normal return value back without having to think about it too much. I can think of two ways I might want to do this.

  1. fork and wait because I want to leak memory somewhere else, but I don't want to run in parallel. This could even be a simple wrapper like what Memoize does.
  2. Run a sub in parallel and collect the return value from it when it's done.

I went looking around CPAN, and I found a few things that are similar but not quite what I had in mind.

Is there a module that does what I want? I want to hand it a sub reference for it to call after a fork, and I want to get back whatever the sub would have returned if I'd called it directly (within understandable limits).

Not finding this, I started thinking about the interface I'd want and how to handle special cases.

  • Do I clobber the "$SIG{CHLD}" of my caller? I guess I'd make that optional.
  • What if the sub throws an exception? I guess I'd throw it at the caller when it requests its return value.
  • What if the sub calls exit?
  • I'd have to get the return value back via pipe. Is that going to step on something in the child?
  • Do I need to take care of cleaning up open files the parent had before the fork? How about database handles? Maybe I just need a hook for something to call after the fork but before the target sub.
  • How much of the sub's context should I simulate? If I'm given a sub in one place and the return value is collected much later, I won't know the context when I need to. I guess I have to make the caller tell me the context ahead of time and create a reasonable default.

My questions are does this already exist? and if not, what features would you want from a new implementation?

Update: Thanks to Corion (in Re^5: Module for transparently forking a sub?), I have seen the light of forks, which is just what I was looking for. Thanks!

Replies are listed 'Best First'.
Re: Module for transparently forking a sub?
by BrowserUk (Patriarch) on Feb 13, 2009 at 17:11 UTC
    My questions are does this already exist?

    Yes! It's called threads::async(). Could it be any easier?

    #! perl -slw use strict; use threads; use Data::Dumper; ## "fork" the subroutine my( $thread ) = async { my %hash = ( A => [ 1 .. 10 ], B => { 'a' .. 'z' }, C => 'Just a big scalar' x 100, ); return \%hash; }; ## Do other stuff sleep 10; ## Get the complex results my( $complexData ) = $thread->join; ## Display them print Dumper $complexData; __END__ c:\test>junk8 $VAR1 = { 'A' => [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ], 'C' => 'Just a big scalarJust a big scalarJust a ... alarJust a big scalarJust a big scalarJust a big scalarJus ... 'B' => { 'w' => 'x', 'e' => 'f', 'a' => 'b', 'm' => 'n', 's' => 't', 'y' => 'z', 'u' => 'v', 'c' => 'd', 'k' => 'l', 'q' => 'r', 'g' => 'h', 'i' => 'j', 'o' => 'p' } };

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Heh. One way it could be easier is if there was a reliable way to know if a given Perl module is thread-safe! Most pure Perl code will be but XS modules often aren't unless someone has gone to the trouble of making them that way.

      The same can be said of forking of course, you could say that DBD::mysql isn't fork-safe and you'd be kind of right. But there's definitely fewer problems with forking and XS code.

      Also, the performance of threads, particularly for smallish tasks, is really quite bad. I know you're going to ask me to quantify that statement but I really don't have the time. I've seen it benchmarked plenty of times before though, so you can probably find a fork versus threads benchmark around.

      -sam

        So, you have time to make the claim, but not the time to substantiate it. There's a name for that:FUD!

        Okay, here my counter claim.

        I can start a thread, run a subroutine that returns a complex data structure, and retrieve that data structure to the calling code faster than you can do the same using fork. My timing is: 0.0261 seconds.

        c:\test>junk8 -N=100 Time taken: 0.0261 seconds { A => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ARGS => [1, "2.3", "four"], B => { a => "b", c => "d", e => "f", g => "h", i => "j", k => "l", "m" => "n", o => "p", "q" => "r", "s" => "t", u => "v", w => "x", "y" => "z", }, C => "Just a big scalarJust a big scala big scalarJust a big scalarJust a big sc }

        And my benchmark code:

        #! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; our $N ||= 10; sub stuff { my %hash = ( ARGS => \@_, A => [ 1 .. 10 ], B => { 'a' .. 'z' }, C => 'Just a big scalar' x 100, ); return \%hash; } my $complexData; my $start = time; for ( 1 .. $N ) { ## "fork" the subroutine my( $thread ) = async \&stuff, 1, 2.3, 'four' ; ## Do other stuff sleep 1; ## Get the complex results $complexData = $thread->join; } printf "Time taken: %.4f seconds\n", ( time() - $start ) / $N - 1; ## Display them pp $complexData;

        Care to substantiate your claim and disprove mine?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      That is easy! Based on some of the other comments here, I think there might still be a use for a fork-based solution. If I wind up writing one, maybe I could steal the interface from threads for it.

        If I wind up writing one, maybe I could steal the interface from threads for it.

        That already exists, but it just as vulnerable to thread-safety problems as ithreads--eg. you probably won't be able to run concurrent DB queries safely; and if you're lucky enough to find thread-safe drivers, concurrent queries will likely not run any quicker than they do serially. And it is not a cross-platform solution; threads is!

        And for passing back complex data structures, serialising them through a pipe is far slower that using shared memory.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        You might want to take a look at IPC::Run. It might be just what you're looking for.
Re: Module for transparently forking a sub?
by Corion (Patriarch) on Feb 13, 2009 at 16:46 UTC

    Why don't you just use threads, which gives you the convenient thing to run code in parallel yet even receive the results without serializing them to disk or other IPC?

      Basically, I don't know much about threads. My hazy impression of threads is that they offer little over forking and that it's hard to share a complex data structure with them. If that's not true, maybe using threads would make this all trivial, and I should just learn that.

        The sharing problems only arise if you actually try to use the same data structure concurrently from two or more threads. If you simply pass off parameters to a subroutine, or to a worker thread using Thread::Queue, you don't have much of a problem. And returning the information (say, again, via a queue) is easy too.

Re: Module for transparently forking a sub?
by ikegami (Patriarch) on Feb 13, 2009 at 16:51 UTC
    An advantage of forking would be ability to impose a timeout. Another would be to change the security context (run-as, jailing, etc)
Re: Module for transparently forking a sub?
by ruzam (Curate) on Feb 13, 2009 at 17:44 UTC

    How about database handles?

    Unless I'm mistaken, your forked database handles will go out of scope when the forked process ends, at which point your database will disconnect and the parent process will be left with an unusable disconnected database handles.

    Hmm... now that I think about it, maybe it's only the parent process ending that disconnects the child process handles. You'll want to test this to be sure. The child process can clone handles off the parent and set the parent's InactiveDestroy flag. Depending on just how much you expect the child to do with the handles.

      You were right the first time - you definitely need to do something to keep the child exiting from messing up your handles, at least with DBD::mysql. Not doing it leads to "server went away" errors in the parent at a random point.

      -sam

Re: Module for transparently forking a sub?
by swares (Monk) on Feb 15, 2009 at 19:51 UTC
    What about using POE? I've been hooked since I discovered it a few years ago, you can even return references to perl data structures. There are some great examples here http://poe.perl.org/?POE_Cookbook

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://743634]
Approved by mr_mischief
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 06:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found