Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

No Performance gain with Parallel::ForkManager

by walto (Pilgrim)
on Feb 23, 2014 at 06:11 UTC ( #1075872=perlquestion: print w/replies, xml ) Need Help??

walto has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I wrote a script to remove unwanted mp3 tags and posted Remove unwanted MP3 tags in the CUFP section. As suggested in the thread I use Parallel::ForkMmanager to loop over the files in parallel. I adjusted the code of the original script but I can see no speed up. I tested it with 100 mp3 files and it was actually slower than the script without Parallel::ForkManager.

time remove_unwanted_mp3.pl

real 0m0.311s, user 0m0.279s, sys 0m0.028s

time parallel_remove_unwanted_mp3.pl

real 0m0.877s, user 0m0.511s, sys 0m0.501s


Here is the code
Without Parallel::ForkManager
#!/usr/bin/perl # # use strict; use warnings; use strict; use Getopt::Std; use File::Find::Rule; use MP3::Tag; use utf8; MP3::Tag->config( write_v24 => 1 ); our $opt_d; my $opt_string = 'd:'; getopts( 'd:', $opt_d ) or usage(); my $dir = $opt_d; my @unwanted_tags = qw (COMM COMM01 COMM02 COMM03 MCDI PRIV PRIV01 PRIV02 PRIV03 PRIV04 +PRIV05 PRIV06 TBPM01 TMED TPUB TSSE TXXX TXXX01 TXXX02 TXXX03 TXXX04 +TXXX05 TXXX06 TXXX07 TXXX08 TXXX09 TXXX10 WCOM WXXX); my %unwanted_tags = map { $unwanted_tags[$_] => 1 } 0 .. $#unwanted_tags; my $rule = File::Find::Rule->file->name("*.mp3")->start( $dir ); while ( defined ( my $file = $rule->match ) ) { my ( $album, $artist, $title ); my $mp3 = MP3::Tag->new($file); $mp3->get_tags(); if ( exists $mp3->{ID3v2} ) { my $id3v2 = $mp3->{ID3v2}; my $frameIDs_hash = $id3v2->get_frame_ids(); if ($frameIDs_hash) { foreach my $frame ( keys %$frameIDs_hash ) { if ( $unwanted_tags{$frame} ) { print "$file\n"; print "Unwanted Frame: $frame found\n"; $id3v2->remove_frame($frame); $id3v2->write_tag(); } } } } } sub usage { print "Please provide parent directory\ne.g. perl remove_unwanted_tags.pl -d + DIRECTORY\n"; exit; }
using Parallel::ForkManager
#!/usr/bin/perl # # use strict; use warnings; use strict; use Getopt::Std; use File::Find::Rule; use MP3::Tag; use Parallel::ForkManager; use utf8; MP3::Tag->config( write_v24 => 1 ); our $opt_d; my $opt_string = 'd:'; getopts( 'd:', $opt_d ) or usage(); my $dir = $opt_d; my @unwanted_tags = qw (COMM COMM01 COMM02 COMM03 MCDI PRIV PRIV01 PRIV02 PRIV03 PRIV04 +PRIV05 PRIV06 TBPM01 TMED TPUB TSSE TXXX TXXX01 TXXX02 TXXX03 TXXX04 +TXXX05 TXXX06 TXXX07 TXXX08 TXXX09 TXXX10 WCOM WXXX); my %unwanted_tags = map { $unwanted_tags[$_] => 1 } 0 .. $#unwanted_tags; my $pm = Parallel::ForkManager->new(50); my $rule = File::Find::Rule->file->name("*.mp3")->start($dir); while ( defined( my $file = $rule->match ) ) { my $pid = $pm->start and next; my ( $album, $artist, $title ); my $mp3 = MP3::Tag->new($file); $mp3->get_tags(); if ( exists $mp3->{ID3v2} ) { my $id3v2 = $mp3->{ID3v2}; my $frameIDs_hash = $id3v2->get_frame_ids(); if ($frameIDs_hash) { foreach my $frame ( keys %$frameIDs_hash ) { if ( $unwanted_tags{$frame} ) { print "$file\n"; print "Unwanted Frame: $frame found\n"; $id3v2->remove_frame($frame); $id3v2->write_tag(); } } } } $pm->finish; } $pm->wait_all_children; sub usage { print "Please provide parent directory\ne.g. perl remove_unwanted_tags.pl -d + DIRECTORY\n"; exit; }

Replies are listed 'Best First'.
Re: No Performance gain with Parallel::ForkManager
by McA (Priest) on Feb 23, 2014 at 07:22 UTC

    Hi,

    the execution time of your script without fork is very good. You should make a simple test, how much time is spent for Perl startup. Forking is a relative expensive action. So you have to be sure that it's worth it. In your case it seems to be not. Have a look at Devel::NYTProf to profile your code.

    UPDATE: With your forking code you allow up to 50 childs of the main process. This seems to be much too much. The whole task you want to optimize is IO bound (scanning the directory tree, reading in files, writing the files). You don't gain much when many processes are fighting for the same resource. When you just want to optimize in terms of computing power I would allow as much subprocesses as you have CPU cores in your computer.

    It would be interesting to test the following approach: Scan the whole directory tree and push all found mp3 files on a array (job queue), then split the queue into N parts (N = number of cores) and start N subprocesses working on their subqueue. In this case you do only fork once per subqueue and not for every file (which is expensive).

    Best regards
    McA

Re: No Performance gain with Parallel::ForkManager
by karlgoethebier (Abbot) on Feb 23, 2014 at 12:02 UTC

    OK, already mentioned by McA :-( but i would also give the old fashioned way a try:

    my @mp3s = File::Find::Rule->file() ->name( '*.mp3' ) ->in( $myMP3Library ); foreach my $mp3 ( @mp3s ) { # doTheForkStuff... }

    Edit: Fixed tag.

    And start with $MAX_PROCESSES = 10; or so.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      ++. Starting 50 processes for processing 100 files is most probably a bad idea in terms of performance (and actually most probably also for more files, but it is especially the case when each process will process only an average 2 files). I would also try with perhaps 10 processes, or maybe even only 5.
Re: No Performance gain with Parallel::ForkManager
by walto (Pilgrim) on Feb 23, 2014 at 14:39 UTC
    Thanks for taking time to work on this.

    McA suggested: Scan the whole directory tree and push all found mp3 files on a array (job queue), then split the queue into N parts (N = number of cores) and start N subprocesses working on their subqueue. In this case you do only fork once per subqueue and not for every file (which is expensive).

    I can not find a simple way to split the loop into 2 (my no of cores) independent subprocesses. So I did what karlgoethebier suggested .
    I tried with $MAX_PROCESSES=5 and $MAX_PROCESSES=2. The execution time is about the same and interestingly the unforked script does the job in about half the execution time.

      ...and interestingly the unforked script does the job in about half the execution time.

      ...because the unforked process isn't causing your disk heads to seek back and forth repeatedly as each forked process grinds away at the same physical resource.

      Sometimes forked processes work like a bucket brigade where the person filling the buckets does it very quickly, and hands them off to the workers who have to run from the water source to the fire and back before they request another full bucket. Other times, it works more like the person doing the filling is standing next to the fire, and has to keep running back and forth between the workers and the water source. You might be in the second situation.


      Dave

        Yes, since the script is basically reading and writing mp3 tags it has mostly IO processes. And as you mentioned the bottleneck is the hard disk. It would be different if there are more computational processes.

      Hi,

      probably something like that:

      #!/usr/bin/perl use strict; use warnings; use strict; use 5.010; use local::lib './lib'; use File::Find::Rule; use Parallel::ForkManager; use utf8; my $dir = './'; my @mp3s = File::Find::Rule->file() ->name( '*.mp3' ) ->in($dir); say "INFO: Number of mp3 files found " . scalar @mp3s; my $num_of_cores = 2; my $pm = Parallel::ForkManager->new($num_of_cores); for my $proc_id (0..$num_of_cores - 1) { my $pid = $pm->start and next; for(my $i = $proc_id; $i < scalar @mp3s; $i += $num_of_cores) { my $file = $mp3s[$i]; say "Proc: $proc_id: Working on '$file'"; } $pm->finish; } $pm->wait_all_children; say "END";

      I'm pretty sure you see the building blocks.

      Regards
      McA

        There's no need to be quite that verbose as Parallel::ForkManager handles the number of processes for you. So you can write that as :-

        ... my $pm = Parallel::ForkManager->new($num_of_cores); for my mp3 (@mp3s) { $pm->start and next; do_work($mp3); $pm->finish; } $pm->wait_all_children;
        Thanks for your example!

        Adding an extra loop for the no of cores brought an speed improvement of the parallel execution of 100%.

        Time parallel execution
        real 0m0.312s, user 0m0.304s, sys 0m0.052s
        Time serial exection
        real 0m0.324s, user 0m0.272s, sys 0m0.031s

        That's approx 4% overall gain.
Re: No Performance gain with Parallel::ForkManager
by karlgoethebier (Abbot) on Feb 23, 2014 at 18:45 UTC

    Nice recommendation by McA! Try my $cores = qx(sysctl -n hw.ncpu); (works on my Mac) before you consult the manual for every machine where your script should run ;-) Please see also sysctl for more details.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: No Performance gain with Parallel::ForkManager
by Laurent_R (Canon) on Feb 23, 2014 at 22:00 UTC
    Although there might be some explanations for it, I am a bit surprised that at least a few processes in parallel does not bring you some performance improvement. I am doing very often some intensive data extraction from a very large (split) database, this is mostly IOs, and I am usually getting the best results with a maximum number of processes anywhere between once to twice the number of CPUs or CPU cores. The results might be very different with a different setting. On the other hand, I came across some cases (wrongly written programs) where one process was locking the data access for the others, preventing any improvement from parallel processing (well, actually leading to poorer performance). I am wondering if you are not meeting one of these cases.

      The thing is that the files being read are big enough that two different files very likely sit on different physical tracks on the hard drive. If you read one file, then the next file, then the next one, sequentially, the amount of drive head movement is minimized. If you read two, three, four, or ten files in parallel, the drive head has to shift back and forth a lot. Also, the buffering is less effective, since the drive reading ahead and filling the buffer is probably going to not fill the buffer with data that will be useful to the next request coming in from a different forked child. So the forks are actually working against each other, losing all buffering benefit, and even causing the drive to have to seek to and fro repeatedly.

      When multiple processes are using the same physical resource to grab information that is distributed all over the place on that resource, in what amounts to be unpredictable order, it's no surprise that they degrade performance.


      Dave

Re: No Performance gain with Parallel::ForkManager
by karlgoethebier (Abbot) on Feb 24, 2014 at 14:20 UTC

    I'm a bit late but perhaps this might be still of interest:

    Using splice for building the subqueues:

    my @mp3s = ( 'a' .. 'z' ); my $cores = 4; my @queue; push @queue, [ splice @mp3s, 0, $cores ] while @mp3s;

    Or an iterator with List::MoreUtils:

    use List::MoreUtils qw (natatime); my @mp3s = ( 'a' .. 'z' ); my $cores = 4; my @queue; my $iterator = natatime $cores, @mp3s; while ( my @buff = $iterator->() ) { push @queue, \@buff; }

    This yields:

    $VAR1 = [ [ 'a', 'b', 'c', 'd' ], [ 'e', 'f', 'g', 'h' ], [ 'i', 'j', 'k', 'l' ], [ 'm', 'n', 'o', 'p' ], [ 'q', 'r', 's', 't' ], [ 'u', 'v', 'w', 'x' ], [ 'y', 'z' ] ];

    Then:

    use Parallel::ForkManager; my $cores = 4; my $pm = Parallel::ForkManager->new($cores); foreach my $child (@queue) { $pm->start and next; process($child); $pm->finish; } $pm->wait_all_children; sub process { my $mp3s = shift; foreach my $mp3 (@$mp3s) { # do the stuff } }

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Hi Karl, On Linux you can get the number of cores with
      my $no_of_cores = system ('/usr/bin/nproc');
      I also tested your approach to split the load onto each core. It was slower than McA suggegestion and the serial processing of files but faster than the original parallel processing.

      real 0m0.531s, user 0m0.449s, sys 0m0.208s

      But thanks for sharing your wisdom. I learnt a lot about making use of a multi core cpu and splitting workloads.
        " ...your approach to split the load...It was slower..."

        Seems like i lost once more ;-)

        "There seems to be something wrong with our bloody ships today." (David_Beatty,_1st_Earl_Beatty)

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1075872]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2021-01-18 01:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?