Fork or Thread?

beanscake has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have chunks of data that i am going to upload to Database which can take days to finish upload , i think if i can partition the arrays of data to be uploaded into some part while i spawn process to save each allocated @array data, i think it may save me some time please i need help with this Threads or Fork please advice

 
use strict;
use warnings;

my @orig = 1..2500;

my $numberofarray = scalar @orig;

my $arrs = 100; # Partition

#print $arrs;
sub Partition_Array_data {
my @arrs;     
push @{$arrs[$_ % $arrs]}, $orig[$_] for 0..$#orig;
if (my $pid = fork) {
        waitpid($pid, 0); } else { 
            if (fork) { exit } else { thread_dbSave(\@arrs); }  } 

}




sub thread_dbSave {  # this function will handle the saving
    
    my @arrayofsplit = @{$_[0]};
    
print join ' ', @$_, "\n" for @arrayofsplit;
    
}

&Partition_Array_data();
[download]

Comment on Fork or Thread? Download Code

Replies are listed 'Best First'.
Re: Fork or Thread? by stevieb (Canon) on Jul 29, 2015 at 21:43 UTC
I can't comment on which is better (fork/threads), but here's an example using `Parallel::ForkManager` that I wrote a while ago for someone else. I've modified it a bit to suit your requirements. First, `$max_forks` specifies the maximum number of forks to run at a time. Next, I create an array of 1000 elements. Then in the `while()`, I splice out 200 elements of the `@array` into a new array, and fork out to `db_save()` with that array slice, simulating your desire to only do a specific number of items per `fork()`. #!/usr/bin/perl use warnings; use strict; use Parallel::ForkManager; my $max_forks = 3; my $fork = new Parallel::ForkManager($max_forks); my @array = (1..1000); # on start callback $fork->run_on_start( sub { my $pid = shift; } ); # on finish callback $fork->run_on_finish( sub { my ($pid, $exit, $ident, $signal, $core) = @_; if ($core){ print "PID $pid core dumped.\n"; } } ); # forking code while (@array){ my @chunk = splice @array, 0, 200; $fork->start and next; sleep(2); db_save(@chunk); $fork->finish; } sub db_save { my @a = @_; print "$a[1]\n"; } $fork->wait_all_children; [download]	[reply] [d/l] [select]
Re^2: Fork or Thread? by beanscake (Acolyte) on Jul 29, 2015 at 22:05 UTC
thank you stevieb, this worked for me.	[reply]
Re: Fork or Thread? by Laurent_R (Canon) on Jul 30, 2015 at 08:28 UTC
I do not know whether parallelizing processes will make your loading significantly faster, but if it really takes days to load the data into your database, you may want to think about some other ways of doing the work. You did not specify on which database engine you are working, so it is difficult to give specific advice, but you may think about a few things that may help getting better performance: Using SQL Loader (for Oracle, or similar tools) Better management of transactions; Modifying the commit rate if applicable; De-activating the redo logs, the statistics, etc.; Dropping the indexes prior to loading the data and rebuilding them only at the end.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks