Module for transparently forking a sub?

kyle has asked for the wisdom of the Perl Monks concerning the following question:

I was thinking this morning that it would be nice to have a way to run some sub in a forked process and get its normal return value back without having to think about it too much. I can think of two ways I might want to do this.

fork and wait because I want to leak memory somewhere else, but I don't want to run in parallel. This could even be a simple wrapper like what Memoize does.
Run a sub in parallel and collect the return value from it when it's done.

I went looking around CPAN, and I found a few things that are similar but not quite what I had in mind.

Parallel::SubFork won't give me the return value.
Parallel::Forker also won't give me the return value.
Proc::Forkfunk expects the sub never to return.
Acme::Fork::Lazy can only return scalars, and it won't let me ask it if something's done—just ask for the result and block if it's not ready.

Is there a module that does what I want? I want to hand it a sub reference for it to call after a fork, and I want to get back whatever the sub would have returned if I'd called it directly (within understandable limits).

Not finding this, I started thinking about the interface I'd want and how to handle special cases.

Do I clobber the "$SIG{CHLD}" of my caller? I guess I'd make that optional.
What if the sub throws an exception? I guess I'd throw it at the caller when it requests its return value.
What if the sub calls exit?
I'd have to get the return value back via pipe. Is that going to step on something in the child?
Do I need to take care of cleaning up open files the parent had before the fork? How about database handles? Maybe I just need a hook for something to call after the fork but before the target sub.
How much of the sub's context should I simulate? If I'm given a sub in one place and the return value is collected much later, I won't know the context when I need to. I guess I have to make the caller tell me the context ahead of time and create a reasonable default.

My questions are does this already exist? and if not, what features would you want from a new implementation?

Update: Thanks to Corion (in Re^5: Module for transparently forking a sub?), I have seen the light of forks, which is just what I was looking for. Thanks!

Comment on Module for transparently forking a sub? Download Code

Replies are listed 'Best First'.
Re: Module for transparently forking a sub? by BrowserUk (Patriarch) on Feb 13, 2009 at 17:11 UTC
My questions are does this already exist? Yes! It's called threads`::async()`. Could it be any easier? #! perl -slw use strict; use threads; use Data::Dumper; ## "fork" the subroutine my( $thread ) = async { my %hash = ( A => [ 1 .. 10 ], B => { 'a' .. 'z' }, C => 'Just a big scalar' x 100, ); return \%hash; }; ## Do other stuff sleep 10; ## Get the complex results my( $complexData ) = $thread->join; ## Display them print Dumper $complexData; __END__ c:\test>junk8 $VAR1 = { 'A' => [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ], 'C' => 'Just a big scalarJust a big scalarJust a ... alarJust a big scalarJust a big scalarJust a big scalarJus ... 'B' => { 'w' => 'x', 'e' => 'f', 'a' => 'b', 'm' => 'n', 's' => 't', 'y' => 'z', 'u' => 'v', 'c' => 'd', 'k' => 'l', 'q' => 'r', 'g' => 'h', 'i' => 'j', 'o' => 'p' } }; [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Module for transparently forking a sub? by samtregar (Abbot) on Feb 13, 2009 at 20:50 UTC
Heh. One way it could be easier is if there was a reliable way to know if a given Perl module is thread-safe! Most pure Perl code will be but XS modules often aren't unless someone has gone to the trouble of making them that way. The same can be said of forking of course, you could say that DBD::mysql isn't fork-safe and you'd be kind of right. But there's definitely fewer problems with forking and XS code. Also, the performance of threads, particularly for smallish tasks, is really quite bad. I know you're going to ask me to quantify that statement but I really don't have the time. I've seen it benchmarked plenty of times before though, so you can probably find a fork versus threads benchmark around. -sam	[reply]
Re^3: Module for transparently forking a sub? by BrowserUk (Patriarch) on Feb 13, 2009 at 22:01 UTC
So, you have time to make the claim, but not the time to substantiate it. There's a name for that:FUD! Okay, here my counter claim. I can start a thread, run a subroutine that returns a complex data structure, and retrieve that data structure to the calling code faster than you can do the same using fork. My timing is: 0.0261 seconds. `c:\test>junk8 -N=100 Time taken: 0.0261 seconds { A => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ARGS => [1, "2.3", "four"], B => { a => "b", c => "d", e => "f", g => "h", i => "j", k => "l", "m" => "n", o => "p", "q" => "r", "s" => "t", u => "v", w => "x", "y" => "z", }, C => "Just a big scalarJust a big scala big scalarJust a big scalarJust a big sc }` [download] And my benchmark code: #! perl -slw use strict; use threads; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; our $N \|\|= 10; sub stuff { my %hash = ( ARGS => \@_, A => [ 1 .. 10 ], B => { 'a' .. 'z' }, C => 'Just a big scalar' x 100, ); return \%hash; } my $complexData; my $start = time; for ( 1 .. $N ) { ## "fork" the subroutine my( $thread ) = async \&stuff, 1, 2.3, 'four' ; ## Do other stuff sleep 1; ## Get the complex results $complexData = $thread->join; } printf "Time taken: %.4f seconds\n", ( time() - $start ) / $N - 1; ## Display them pp $complexData; [download] Care to substantiate your claim and disprove mine? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Module for transparently forking a sub? by samtregar (Abbot) on Feb 14, 2009 at 05:33 UTC
Re^5: Module for transparently forking a sub? by BrowserUk (Patriarch) on Feb 14, 2009 at 20:08 UTC
Some notes below your chosen depth have not been shown here
Re^4: Module for transparently forking a sub? by samtregar (Abbot) on Feb 13, 2009 at 22:40 UTC
Re^4: Module for transparently forking a sub? by Anonymous Monk on Feb 14, 2009 at 18:05 UTC
Re^2: Module for transparently forking a sub? by kyle (Abbot) on Feb 13, 2009 at 21:51 UTC
That is easy! Based on some of the other comments here, I think there might still be a use for a fork-based solution. If I wind up writing one, maybe I could steal the interface from threads for it.	[reply]
Re^3: Module for transparently forking a sub? by BrowserUk (Patriarch) on Feb 13, 2009 at 22:06 UTC
If I wind up writing one, maybe I could steal the interface from threads for it. That already exists, but it just as vulnerable to thread-safety problems as ithreads--eg. you probably won't be able to run concurrent DB queries safely; and if you're lucky enough to find thread-safe drivers, concurrent queries will likely not run any quicker than they do serially. And it is not a cross-platform solution; threads is! And for passing back complex data structures, serialising them through a pipe is far slower that using shared memory. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: Module for transparently forking a sub? by kyle (Abbot) on Feb 13, 2009 at 22:12 UTC
Re^5: Module for transparently forking a sub? by Corion (Patriarch) on Feb 13, 2009 at 22:18 UTC
Some notes below your chosen depth have not been shown here
Re^3: Module for transparently forking a sub? by gloryhack (Deacon) on Feb 16, 2009 at 02:14 UTC
You might want to take a look at IPC::Run. It might be just what you're looking for.	[reply]
Re: Module for transparently forking a sub? by Corion (Patriarch) on Feb 13, 2009 at 16:46 UTC
Why don't you just use threads, which gives you the convenient thing to run code in parallel yet even receive the results without serializing them to disk or other IPC?	[reply]
Re^2: Module for transparently forking a sub? by kyle (Abbot) on Feb 13, 2009 at 16:52 UTC
Basically, I don't know much about threads. My hazy impression of threads is that they offer little over forking and that it's hard to share a complex data structure with them. If that's not true, maybe using threads would make this all trivial, and I should just learn that.	[reply]
Re^3: Module for transparently forking a sub? by Corion (Patriarch) on Feb 13, 2009 at 16:54 UTC
The sharing problems only arise if you actually try to use the same data structure concurrently from two or more threads. If you simply pass off parameters to a subroutine, or to a worker thread using Thread::Queue, you don't have much of a problem. And returning the information (say, again, via a queue) is easy too.	[reply]
Re: Module for transparently forking a sub? by ikegami (Patriarch) on Feb 13, 2009 at 16:51 UTC
An advantage of forking would be ability to impose a timeout. Another would be to change the security context (run-as, jailing, etc)	[reply]
Re: Module for transparently forking a sub? by ruzam (Curate) on Feb 13, 2009 at 17:44 UTC
How about database handles? Unless I'm mistaken, your forked database handles will go out of scope when the forked process ends, at which point your database will disconnect and the parent process will be left with an unusable disconnected database handles. Hmm... now that I think about it, maybe it's only the parent process ending that disconnects the child process handles. You'll want to test this to be sure. The child process can clone handles off the parent and set the parent's InactiveDestroy flag. Depending on just how much you expect the child to do with the handles.	[reply]
Re^2: Module for transparently forking a sub? by samtregar (Abbot) on Feb 13, 2009 at 20:45 UTC
You were right the first time - you definitely need to do something to keep the child exiting from messing up your handles, at least with DBD::mysql. Not doing it leads to "server went away" errors in the parent at a random point. -sam	[reply]
Re: Module for transparently forking a sub? by swares (Monk) on Feb 15, 2009 at 19:51 UTC
What about using POE? I've been hooked since I discovered it a few years ago, you can even return references to perl data structures. There are some great examples here http://poe.perl.org/?POE_Cookbook	[reply]


"be consistent"
	PerlMonks