http://www.perlmonks.org?node_id=1221374


in reply to Re: About text file parsing
in thread About text file parsing

Hi again,

One may want to have the manager-process receive and loop through @sample and @good. That will incur an additional CPU core for the manager-process itself.

use strict; use warnings; use MCE; open my $sample_fh, ">", "sample.txt" or die "open error: $!"; open my $good_fh, ">", "good.txt" or die "open error: $!"; # worker function sub task { my ( $mce, $slurp_ref, $chunk_id ) = @_; my ( @sample, @good ); # open file handle to scalar ref open my $input_fh, "<", $slurp_ref; # append to scalars inside the loop while (<$input_fh>) { if (/^sample\s+(\S+)/) { push @sample, $1; } elsif (/^good\s+(\S+)/) { push @good, $1; } } close $input_fh; # send arrays to the manager-process MCE->gather(\@sample, \@good); } # manager function sub gather { my ( $sample, $good ) = @_; # process sample for ( @{ $sample } ) { ; } # process good for ( @{ $good } ) { ; } } # spawn workers early, optionally my $mce = MCE->new( chunk_size => '1m', # 1 megabyte max_workers => 4, use_slurpio => 1, user_func => \&task, gather => \&gather, )->spawn; # process input file(s) $mce->process({ input_data => "test.txt" }); # shutdown workers $mce->shutdown; # close output handles close $sample_fh; close $good_fh;

The extra time comes from workers appending to local arrays. Likewise, the manager-process receiving and looping through the arrays. There are 4 workers and the manager process running simultaneously on a machine with 4 real cores.

$ time perl test_demo.pl real 0m9.932s user 0m43.956s sys 0m0.452s

Update:

Interestingly, Perl v5.20 and higher take 2x longer to run. I'm not sure why. Yikes, possibly from regular expression? This is on my TODO list to check why. The above was captured from Perl v5.18.2 on the same machine.

$ time /opt/perl-5.20.3/bin/perl test_demo.pl real 0m20.858s user 1m20.164s sys 0m8.488s

Regards, Mario

Replies are listed 'Best First'.
Re^3: About text file parsing
by marioroy (Prior) on Aug 30, 2018 at 14:17 UTC

    Once again, hi :)

    Using a simplified demonstration, regular expression appears to be 3x slower in Perl v5.20 and higher. I'm not sure why.

    use strict; use warnings; use MCE; sub task { my ( $mce, $slurp_ref, $chunk_id ) = @_; # open file handle to scalar ref open my $input_fh, "<", $slurp_ref; while (<$input_fh>) { if (/^sample\s+(\S+)/) { ; } elsif (/^good\s+(\S+)/) { ; } } close $input_fh; } MCE->new( chunk_size => '1m', max_workers => 4, use_slurpio => 1, user_func => \&task ); MCE->process({ input_data => "test.txt" }); MCE->shutdown;

    Results

    $ time /opt/perl-5.8.9/bin/perl -I. test_demo.pl real 0m3.826s user 0m14.352s sys 0m0.133s $ time /opt/perl-5.10.1/bin/perl -I. test_demo.pl real 0m4.369s user 0m16.935s sys 0m0.126s $ time /opt/perl-5.12.5/bin/perl -I. test_demo.pl real 0m4.889s user 0m18.944s sys 0m0.134s $ time /opt/perl-5.14.4/bin/perl -I. test_demo.pl real 0m4.860s user 0m18.865s sys 0m0.127s $ time /opt/perl-5.16.3/bin/perl -I. test_demo.pl real 0m4.815s user 0m18.724s sys 0m0.129s $ time /opt/perl-5.18.4/bin/perl -I. test_demo.pl real 0m4.668s user 0m18.356s sys 0m0.116s $ time /opt/perl-5.20.3/bin/perl -I. test_demo.pl real 0m14.195s user 0m49.155s sys 0m7.282s $ time /opt/perl-5.22.4/bin/perl -I. test_demo.pl real 0m14.316s user 0m49.586s sys 0m7.041s $ time /opt/perl-5.24.3/bin/perl -I. test_demo.pl real 0m14.612s user 0m50.251s sys 0m7.531s $ time /opt/perl-5.26.1/bin/perl -I. test_demo.pl real 0m14.212s user 0m49.418s sys 0m6.999s $ time /opt/perl-5.28.0/bin/perl -I. test_demo.pl real 0m14.308s user 0m49.476s sys 0m7.137s

    Regards, Mario