kyle has asked for the wisdom of the Perl Monks concerning the following question:
I was thinking this morning that it would be nice to have a way to run some sub in a forked process and get its normal return value back without having to think about it too much. I can think of two ways I might want to do this.
- fork and wait because I want to leak memory somewhere else, but I don't want to run in parallel. This could even be a simple wrapper like what Memoize does.
- Run a sub in parallel and collect the return value from it when it's done.
I went looking around CPAN, and I found a few things that are similar but not quite what I had in mind.
Is there a module that does what I want? I want to hand it a sub reference for it to call after a fork, and I want to get back whatever the sub would have returned if I'd called it directly (within understandable limits).
Not finding this, I started thinking about the interface I'd want and how to handle special cases.
- Do I clobber the "$SIG{CHLD}" of my caller? I guess I'd make that optional.
- What if the sub throws an exception? I guess I'd throw it at the caller when it requests its return value.
- What if the sub calls exit?
- I'd have to get the return value back via pipe. Is that going to step on something in the child?
- Do I need to take care of cleaning up open files the parent had before the fork? How about database handles? Maybe I just need a hook for something to call after the fork but before the target sub.
- How much of the sub's context should I simulate? If I'm given a sub in one place and the return value is collected much later, I won't know the context when I need to. I guess I have to make the caller tell me the context ahead of time and create a reasonable default.
My questions are does this already exist? and if not, what features would you want from a new implementation?
Update: Thanks to Corion (in Re^5: Module for transparently forking a sub?), I have seen the light of forks, which is just what I was looking for. Thanks!
Re: Module for transparently forking a sub?
by BrowserUk (Patriarch) on Feb 13, 2009 at 17:11 UTC
|
#! perl -slw
use strict;
use threads;
use Data::Dumper;
## "fork" the subroutine
my( $thread ) = async {
my %hash = (
A => [ 1 .. 10 ],
B => { 'a' .. 'z' },
C => 'Just a big scalar' x 100,
);
return \%hash;
};
## Do other stuff
sleep 10;
## Get the complex results
my( $complexData ) = $thread->join;
## Display them
print Dumper $complexData;
__END__
c:\test>junk8
$VAR1 = {
'A' => [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10
],
'C' => 'Just a big scalarJust a big scalarJust a ...
alarJust a big scalarJust a big scalarJust a big scalarJus ...
'B' => {
'w' => 'x',
'e' => 'f',
'a' => 'b',
'm' => 'n',
's' => 't',
'y' => 'z',
'u' => 'v',
'c' => 'd',
'k' => 'l',
'q' => 'r',
'g' => 'h',
'i' => 'j',
'o' => 'p'
}
};
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
Heh. One way it could be easier is if there was a reliable way to know if a given Perl module is thread-safe! Most pure Perl code will be but XS modules often aren't unless someone has gone to the trouble of making them that way.
The same can be said of forking of course, you could say that DBD::mysql isn't fork-safe and you'd be kind of right. But there's definitely fewer problems with forking and XS code.
Also, the performance of threads, particularly for smallish tasks, is really quite bad. I know you're going to ask me to quantify that statement but I really don't have the time. I've seen it benchmarked plenty of times before though, so you can probably find a fork versus threads benchmark around.
-sam
| [reply] |
|
So, you have time to make the claim, but not the time to substantiate it. There's a name for that:FUD!
Okay, here my counter claim.
I can start a thread, run a subroutine that returns a complex data structure, and retrieve that data structure to the calling code faster than you can do the same using fork. My timing is: 0.0261 seconds.
c:\test>junk8 -N=100
Time taken: 0.0261 seconds
{
A => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
ARGS => [1, "2.3", "four"],
B => {
a => "b",
c => "d",
e => "f",
g => "h",
i => "j",
k => "l",
"m" => "n",
o => "p",
"q" => "r",
"s" => "t",
u => "v",
w => "x",
"y" => "z",
},
C => "Just a big scalarJust a big scala
big scalarJust a big scalarJust a big sc
}
And my benchmark code:
#! perl -slw
use strict;
use threads;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ];
our $N ||= 10;
sub stuff {
my %hash = (
ARGS => \@_,
A => [ 1 .. 10 ],
B => { 'a' .. 'z' },
C => 'Just a big scalar' x 100,
);
return \%hash;
}
my $complexData;
my $start = time;
for ( 1 .. $N ) {
## "fork" the subroutine
my( $thread ) = async \&stuff, 1, 2.3, 'four' ;
## Do other stuff
sleep 1;
## Get the complex results
$complexData = $thread->join;
}
printf "Time taken: %.4f seconds\n",
( time() - $start ) / $N - 1;
## Display them
pp $complexData;
Care to substantiate your claim and disprove mine?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
|
| [reply] |
|
If I wind up writing one, maybe I could steal the interface from threads for it.
That already exists, but it just as vulnerable to thread-safety problems as ithreads--eg. you probably won't be able to run concurrent DB queries safely; and if you're lucky enough to find thread-safe drivers, concurrent queries will likely not run any quicker than they do serially. And it is not a cross-platform solution; threads is!
And for passing back complex data structures, serialising them through a pipe is far slower that using shared memory.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
|
|
You might want to take a look at IPC::Run. It might be just what you're looking for.
| [reply] |
Re: Module for transparently forking a sub?
by Corion (Patriarch) on Feb 13, 2009 at 16:46 UTC
|
Why don't you just use threads, which gives you the convenient thing to run code in parallel yet even receive the results without serializing them to disk or other IPC?
| [reply] |
|
Basically, I don't know much about threads. My hazy impression of threads is that they offer little over forking and that it's hard to share a complex data structure with them. If that's not true, maybe using threads would make this all trivial, and I should just learn that.
| [reply] |
|
The sharing problems only arise if you actually try to use the same data structure concurrently from two or more threads. If you simply pass off parameters to a subroutine, or to a worker thread using Thread::Queue, you don't have much of a problem. And returning the information (say, again, via a queue) is easy too.
| [reply] |
Re: Module for transparently forking a sub?
by ikegami (Patriarch) on Feb 13, 2009 at 16:51 UTC
|
An advantage of forking would be ability to impose a timeout. Another would be to change the security context (run-as, jailing, etc)
| [reply] |
Re: Module for transparently forking a sub?
by ruzam (Curate) on Feb 13, 2009 at 17:44 UTC
|
How about database handles?
Unless I'm mistaken, your forked database handles will go out of scope when the forked process ends, at which point your database will disconnect and the parent process will be left with an unusable disconnected database handles.
Hmm... now that I think about it, maybe it's only the parent process ending that disconnects the child process handles. You'll want to test this to be sure. The child process can clone handles off the parent and set the parent's InactiveDestroy flag. Depending on just how much you expect the child to do with the handles.
| [reply] |
|
You were right the first time - you definitely need to do something to keep the child exiting from messing up your handles, at least with DBD::mysql. Not doing it leads to "server went away" errors in the parent at a random point.
-sam
| [reply] |
Re: Module for transparently forking a sub?
by swares (Monk) on Feb 15, 2009 at 19:51 UTC
|
What about using POE? I've been hooked since I discovered it a few years ago, you can even return references to perl data structures. There are some great examples here http://poe.perl.org/?POE_Cookbook
| [reply] |
|
|