No Performance gain with Parallel::ForkManager

walto has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I wrote a script to remove unwanted mp3 tags and posted Remove unwanted MP3 tags in the CUFP section. As suggested in the thread I use Parallel::ForkMmanager to loop over the files in parallel. I adjusted the code of the original script but I can see no speed up. I tested it with 100 mp3 files and it was actually slower than the script without Parallel::ForkManager.

time remove_unwanted_mp3.pl

real 0m0.311s, user 0m0.279s, sys 0m0.028s

time parallel_remove_unwanted_mp3.pl

real 0m0.877s, user 0m0.511s, sys 0m0.501s

Here is the code

Without Parallel::ForkManager

#!/usr/bin/perl
#
#
use strict;
use warnings;
use strict;
use Getopt::Std;
use File::Find::Rule;
use MP3::Tag;
use utf8;

MP3::Tag->config( write_v24 => 1 );

our $opt_d;
my $opt_string = 'd:';
getopts( 'd:', $opt_d ) or usage();
my $dir = $opt_d;

my @unwanted_tags =
  qw (COMM COMM01 COMM02 COMM03 MCDI PRIV PRIV01 PRIV02 PRIV03 PRIV04 
+PRIV05 PRIV06 TBPM01 TMED TPUB TSSE TXXX TXXX01 TXXX02 TXXX03 TXXX04 
+TXXX05 TXXX06 TXXX07 TXXX08 TXXX09 TXXX10 WCOM WXXX);
my %unwanted_tags =
  map { $unwanted_tags[$_] => 1 } 0 .. $#unwanted_tags;

my $rule = File::Find::Rule->file->name("*.mp3")->start( $dir );
 while ( defined ( my $file = $rule->match ) ) {

    my ( $album, $artist, $title );
    my $mp3 = MP3::Tag->new($file);
    $mp3->get_tags();

    if ( exists $mp3->{ID3v2} ) {
        my $id3v2         = $mp3->{ID3v2};
        my $frameIDs_hash = $id3v2->get_frame_ids();
        
        if ($frameIDs_hash) {
            foreach my $frame ( keys %$frameIDs_hash ) {
                if ( $unwanted_tags{$frame} ) {
                    print "$file\n";
                    print "Unwanted Frame: $frame found\n";
                    $id3v2->remove_frame($frame);
                    $id3v2->write_tag();
                }
            }
        }
    }
}

sub usage {
    print
"Please provide parent directory\ne.g. perl remove_unwanted_tags.pl -d
+ DIRECTORY\n";
    exit;
}
[download]

using Parallel::ForkManager

#!/usr/bin/perl
#
#
use strict;
use warnings;
use strict;
use Getopt::Std;
use File::Find::Rule;
use MP3::Tag;
use Parallel::ForkManager;
use utf8;

MP3::Tag->config( write_v24 => 1 );

our $opt_d;
my $opt_string = 'd:';
getopts( 'd:', $opt_d ) or usage();
my $dir = $opt_d;

my @unwanted_tags =
  qw (COMM COMM01 COMM02 COMM03 MCDI PRIV PRIV01 PRIV02 PRIV03 PRIV04 
+PRIV05 PRIV06 TBPM01 TMED TPUB TSSE TXXX TXXX01 TXXX02 TXXX03 TXXX04 
+TXXX05 TXXX06 TXXX07 TXXX08 TXXX09 TXXX10 WCOM WXXX);
my %unwanted_tags =
  map { $unwanted_tags[$_] => 1 } 0 .. $#unwanted_tags;

my $pm   = Parallel::ForkManager->new(50);
my $rule = File::Find::Rule->file->name("*.mp3")->start($dir);

while ( defined( my $file = $rule->match ) ) {

    my $pid = $pm->start and next;
    
    my ( $album, $artist, $title );
    my $mp3 = MP3::Tag->new($file);
    $mp3->get_tags();

    if ( exists $mp3->{ID3v2} ) {
        my $id3v2         = $mp3->{ID3v2};
        my $frameIDs_hash = $id3v2->get_frame_ids();

        if ($frameIDs_hash) {
            foreach my $frame ( keys %$frameIDs_hash ) {
                if ( $unwanted_tags{$frame} ) {
                    print "$file\n";
                    print "Unwanted Frame: $frame found\n";
                    $id3v2->remove_frame($frame);
                    $id3v2->write_tag();
                }
            }
        }
    }
    $pm->finish;
}
$pm->wait_all_children;

sub usage {
    print
"Please provide parent directory\ne.g. perl remove_unwanted_tags.pl -d
+ DIRECTORY\n";
    exit;
}
[download]

Comment on No Performance gain with Parallel::ForkManager Select or Download Code

Replies are listed 'Best First'.
Re: No Performance gain with Parallel::ForkManager by McA (Priest) on Feb 23, 2014 at 07:22 UTC
Hi, the execution time of your script without fork is very good. You should make a simple test, how much time is spent for Perl startup. Forking is a relative expensive action. So you have to be sure that it's worth it. In your case it seems to be not. Have a look at Devel::NYTProf to profile your code. UPDATE: With your forking code you allow up to 50 childs of the main process. This seems to be much too much. The whole task you want to optimize is IO bound (scanning the directory tree, reading in files, writing the files). You don't gain much when many processes are fighting for the same resource. When you just want to optimize in terms of computing power I would allow as much subprocesses as you have CPU cores in your computer. It would be interesting to test the following approach: Scan the whole directory tree and push all found mp3 files on a array (job queue), then split the queue into N parts (N = number of cores) and start N subprocesses working on their subqueue. In this case you do only fork once per subqueue and not for every file (which is expensive). Best regards McA	[reply]
Re: No Performance gain with Parallel::ForkManager by karlgoethebier (Abbot) on Feb 23, 2014 at 12:02 UTC
OK, already mentioned by McA :-( but i would also give the old fashioned way a try: `my @mp3s = File::Find::Rule->file() ->name( '.mp3' ) ->in( $myMP3Library ); foreach my $mp3 ( @mp3s ) { # doTheForkStuff... }` [download] Edit:* Fixed tag. And start with `$MAX_PROCESSES = 10;` or so. Regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l] [select]
Re^2: No Performance gain with Parallel::ForkManager by Laurent_R (Canon) on Feb 23, 2014 at 12:12 UTC
++. Starting 50 processes for processing 100 files is most probably a bad idea in terms of performance (and actually most probably also for more files, but it is especially the case when each process will process only an average 2 files). I would also try with perhaps 10 processes, or maybe even only 5.	[reply]
Re: No Performance gain with Parallel::ForkManager by walto (Pilgrim) on Feb 23, 2014 at 14:39 UTC
Thanks for taking time to work on this. McA suggested: Scan the whole directory tree and push all found mp3 files on a array (job queue), then split the queue into N parts (N = number of cores) and start N subprocesses working on their subqueue. In this case you do only fork once per subqueue and not for every file (which is expensive). I can not find a simple way to split the loop into 2 (my no of cores) independent subprocesses. So I did what karlgoethebier suggested . I tried with $MAX_PROCESSES=5 and $MAX_PROCESSES=2. The execution time is about the same and interestingly the unforked script does the job in about half the execution time.	[reply]
Re^2: No Performance gain with Parallel::ForkManager by davido (Cardinal) on Feb 23, 2014 at 19:02 UTC
...and interestingly the unforked script does the job in about half the execution time. ...because the unforked process isn't causing your disk heads to seek back and forth repeatedly as each forked process grinds away at the same physical resource. Sometimes forked processes work like a bucket brigade where the person filling the buckets does it very quickly, and hands them off to the workers who have to run from the water source to the fire and back before they request another full bucket. Other times, it works more like the person doing the filling is standing next to the fire, and has to keep running back and forth between the workers and the water source. You might be in the second situation. Dave	[reply]
Re^3: No Performance gain with Parallel::ForkManager by walto (Pilgrim) on Feb 23, 2014 at 19:46 UTC
Yes, since the script is basically reading and writing mp3 tags it has mostly IO processes. And as you mentioned the bottleneck is the hard disk. It would be different if there are more computational processes.	[reply]
Re^2: No Performance gain with Parallel::ForkManager by McA (Priest) on Feb 23, 2014 at 17:11 UTC
Hi, probably something like that: #!/usr/bin/perl use strict; use warnings; use strict; use 5.010; use local::lib './lib'; use File::Find::Rule; use Parallel::ForkManager; use utf8; my $dir = './'; my @mp3s = File::Find::Rule->file() ->name( '*.mp3' ) ->in($dir); say "INFO: Number of mp3 files found " . scalar @mp3s; my $num_of_cores = 2; my $pm = Parallel::ForkManager->new($num_of_cores); for my $proc_id (0..$num_of_cores - 1) { my $pid = $pm->start and next; for(my $i = $proc_id; $i < scalar @mp3s; $i += $num_of_cores) { my $file = $mp3s[$i]; say "Proc: $proc_id: Working on '$file'"; } $pm->finish; } $pm->wait_all_children; say "END"; [download] I'm pretty sure you see the building blocks. Regards McA	[reply] [d/l]
Re^3: No Performance gain with Parallel::ForkManager by RichardK (Parson) on Feb 23, 2014 at 17:59 UTC
There's no need to be quite that verbose as Parallel::ForkManager handles the number of processes for you. So you can write that as :- `... my $pm = Parallel::ForkManager->new($num_of_cores); for my mp3 (@mp3s) { $pm->start and next; do_work($mp3); $pm->finish; } $pm->wait_all_children;` [download]	[reply] [d/l]
Re^4: No Performance gain with Parallel::ForkManager by McA (Priest) on Feb 23, 2014 at 18:29 UTC
Re^3: No Performance gain with Parallel::ForkManager by walto (Pilgrim) on Feb 23, 2014 at 19:06 UTC
Thanks for your example! Adding an extra loop for the no of cores brought an speed improvement of the parallel execution of 100%. Time parallel execution real 0m0.312s, user 0m0.304s, sys 0m0.052s Time serial exection real 0m0.324s, user 0m0.272s, sys 0m0.031s That's approx 4% overall gain.	[reply]
Re: No Performance gain with Parallel::ForkManager by karlgoethebier (Abbot) on Feb 23, 2014 at 18:45 UTC
Nice recommendation by McA! Try `my $cores = qx(sysctl -n hw.ncpu);` (works on my Mac) before you consult the manual for every machine where your script should run ;-) Please see also sysctl for more details. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l]
Re: No Performance gain with Parallel::ForkManager by Laurent_R (Canon) on Feb 23, 2014 at 22:00 UTC
Although there might be some explanations for it, I am a bit surprised that at least a few processes in parallel does not bring you some performance improvement. I am doing very often some intensive data extraction from a very large (split) database, this is mostly IOs, and I am usually getting the best results with a maximum number of processes anywhere between once to twice the number of CPUs or CPU cores. The results might be very different with a different setting. On the other hand, I came across some cases (wrongly written programs) where one process was locking the data access for the others, preventing any improvement from parallel processing (well, actually leading to poorer performance). I am wondering if you are not meeting one of these cases.	[reply]
Re^2: No Performance gain with Parallel::ForkManager by davido (Cardinal) on Feb 24, 2014 at 01:08 UTC
The thing is that the files being read are big enough that two different files very likely sit on different physical tracks on the hard drive. If you read one file, then the next file, then the next one, sequentially, the amount of drive head movement is minimized. If you read two, three, four, or ten files in parallel, the drive head has to shift back and forth a lot. Also, the buffering is less effective, since the drive reading ahead and filling the buffer is probably going to not fill the buffer with data that will be useful to the next request coming in from a different forked child. So the forks are actually working against each other, losing all buffering benefit, and even causing the drive to have to seek to and fro repeatedly. When multiple processes are using the same physical resource to grab information that is distributed all over the place on that resource, in what amounts to be unpredictable order, it's no surprise that they degrade performance. Dave	[reply]
Re: No Performance gain with Parallel::ForkManager by karlgoethebier (Abbot) on Feb 24, 2014 at 14:20 UTC
I'm a bit late but perhaps this might be still of interest: Using splice for building the subqueues: `my @mp3s = ( 'a' .. 'z' ); my $cores = 4; my @queue; push @queue, [ splice @mp3s, 0, $cores ] while @mp3s;` [download] Or an iterator with List::MoreUtils: `use List::MoreUtils qw (natatime); my @mp3s = ( 'a' .. 'z' ); my $cores = 4; my @queue; my $iterator = natatime $cores, @mp3s; while ( my @buff = $iterator->() ) { push @queue, \@buff; }` [download] This yields: `$VAR1 = [ [ 'a', 'b', 'c', 'd' ], [ 'e', 'f', 'g', 'h' ], [ 'i', 'j', 'k', 'l' ], [ 'm', 'n', 'o', 'p' ], [ 'q', 'r', 's', 't' ], [ 'u', 'v', 'w', 'x' ], [ 'y', 'z' ] ];` [download] Then: `use Parallel::ForkManager; my $cores = 4; my $pm = Parallel::ForkManager->new($cores); foreach my $child (@queue) { $pm->start and next; process($child); $pm->finish; } $pm->wait_all_children; sub process { my $mp3s = shift; foreach my $mp3 (@$mp3s) { # do the stuff } }` [download] Regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply] [d/l] [select]
Re^2: No Performance gain with Parallel::ForkManager by walto (Pilgrim) on Feb 24, 2014 at 16:55 UTC
Hi Karl, On Linux you can get the number of cores with `my $no_of_cores = system ('/usr/bin/nproc');` [download] I also tested your approach to split the load onto each core. It was slower than McA suggegestion and the serial processing of files but faster than the original parallel processing. real 0m0.531s, user 0m0.449s, sys 0m0.208s But thanks for sharing your wisdom. I learnt a lot about making use of a multi core cpu and splitting workloads.	[reply] [d/l]
Re^3: No Performance gain with Parallel::ForkManager by karlgoethebier (Abbot) on Feb 24, 2014 at 17:58 UTC
" ...your approach to split the load...It was slower..." Seems like i lost once more ;-) "There seems to be something wrong with our bloody ships today." (David_Beatty,_1st_Earl_Beatty) Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť	[reply]
Re^4: No Performance gain with Parallel::ForkManager by walto (Pilgrim) on Feb 24, 2014 at 18:14 UTC

Back to Seekers of Perl Wisdom