Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Fork or Thread?

by beanscake (Acolyte)
on Jul 29, 2015 at 20:44 UTC ( [id://1136788]=perlquestion: print w/replies, xml ) Need Help??

beanscake has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have chunks of data that i am going to upload to Database which can take days to finish upload , i think if i can partition the arrays of data to be uploaded into some part while i spawn process to save each allocated @array data, i think it may save me some time please i need help with this Threads or Fork please advice

use strict; use warnings; my @orig = 1..2500; my $numberofarray = scalar @orig; my $arrs = 100; # Partition #print $arrs; sub Partition_Array_data { my @arrs; push @{$arrs[$_ % $arrs]}, $orig[$_] for 0..$#orig; if (my $pid = fork) { waitpid($pid, 0); } else { if (fork) { exit } else { thread_dbSave(\@arrs); } } } sub thread_dbSave { # this function will handle the saving my @arrayofsplit = @{$_[0]}; print join ' ', @$_, "\n" for @arrayofsplit; } &Partition_Array_data();

Replies are listed 'Best First'.
Re: Fork or Thread?
by stevieb (Canon) on Jul 29, 2015 at 21:43 UTC

    I can't comment on which is better (fork/threads), but here's an example using Parallel::ForkManager that I wrote a while ago for someone else.

    I've modified it a bit to suit your requirements. First, $max_forks specifies the maximum number of forks to run at a time. Next, I create an array of 1000 elements. Then in the while(), I splice out 200 elements of the @array into a new array, and fork out to db_save() with that array slice, simulating your desire to only do a specific number of items per fork().

    #!/usr/bin/perl use warnings; use strict; use Parallel::ForkManager; my $max_forks = 3; my $fork = new Parallel::ForkManager($max_forks); my @array = (1..1000); # on start callback $fork->run_on_start( sub { my $pid = shift; } ); # on finish callback $fork->run_on_finish( sub { my ($pid, $exit, $ident, $signal, $core) = @_; if ($core){ print "PID $pid core dumped.\n"; } } ); # forking code while (@array){ my @chunk = splice @array, 0, 200; $fork->start and next; sleep(2); db_save(@chunk); $fork->finish; } sub db_save { my @a = @_; print "$a[1]\n"; } $fork->wait_all_children;
      thank you stevieb, this worked for me.
Re: Fork or Thread?
by Laurent_R (Canon) on Jul 30, 2015 at 08:28 UTC
    I do not know whether parallelizing processes will make your loading significantly faster, but if it really takes days to load the data into your database, you may want to think about some other ways of doing the work.

    You did not specify on which database engine you are working, so it is difficult to give specific advice, but you may think about a few things that may help getting better performance:

    • Using SQL Loader (for Oracle, or similar tools)
    • Better management of transactions;
    • Modifying the commit rate if applicable;
    • De-activating the redo logs, the statistics, etc.;
    • Dropping the indexes prior to loading the data and rebuilding them only at the end.
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1136788]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found