Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Question about Parallel::ForkManager

by vit (Friar)
on Sep 30, 2011 at 21:05 UTC ( [id://928917]=perlquestion: print w/replies, xml ) Need Help??

vit has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
What am I doing or understanding wrong using this module this way:
my @ar = (); use Parallel::ForkManager; my $pm = new Parallel::ForkManager(3); foreach my $rec (@urls) { # Forks and returns the pid for the child: my $pid = $pm->start and next; fill out @ar processing @urls $pm->finish; # Terminates the child process }
At the end I want to have populated @ar

Replies are listed 'Best First'.
Re: Question about Parallel::ForkManager
by ikegami (Patriarch) on Sep 30, 2011 at 21:31 UTC
    There are 1+@urls copies of @ar. You want to change @ar in the parent, which isn't something the children can do directly. The easiest wat is to use the mechanism described in the "RETRIEVING DATASTRUCTURES from child processes" section of the documentation.
      So there is no way to avoid disk memory usage?
      It's not good in terms of performance. But also since I may have many calls from clients on the server will not they race for the same tmp file?
      Also, say if the process is interrupted by some reason in the middle the tmp file will never be removed.

        I didn't realized it used the disk. Pipes could be used, so I'm curious why it uses the disk. (Ah yes, using pipes would prevent the parent from doing other work while the children are running. This is usually not a problem, but it would break P::FM's interface to support this.)

        There won't be a race condition. It surely defends against that using the process id in the file name.

Re: Question about Parallel::ForkManager
by Anonymous Monk on Oct 02, 2011 at 12:31 UTC
    If you spawn one process for each URL, you're bound to cause massive thrashing and this is where your disk-usage is coming from. Instead, launch a small number of child processes who consume URLs from a thread-safe queue. The number of children should have no relation to the size of the workload that they must cooperatively accomplish. Rather, it should be tied to how many parallel processes you have determined the system can actually handle with maximum sustained throughput. (Do not be surprised if the best answer is "1.")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://928917]
Approved by BrowserUk
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-19 11:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found