http://www.perlmonks.org?node_id=1221395


in reply to Re: About text file parsing
in thread About text file parsing

That's cool, tybalt89. Each day, learn something new about Perl.

I ran serially and parallel with "text.txt" containing 50 million lines. There is no slowness using Perl v5.20 and higher.

Serial

use strict; use warnings; open my $input_fh, '<', 'test.txt' or die "open error: $!"; open my $sample_fh, '>', 'sample.txt' or die "open error: $!"; open my $good_fh, '>', 'good.txt' or die "open error: $!"; # tybalt89's technique running serially # see https://www.perlmonks.org/?node_id=1221387 local $/ = \2e6; # or bigger chunk depending on your memory size while (<$input_fh>) { # read big chunk $_ .= do { local $/ = "\n"; <$input_fh> // ''}; # read any partial + line print $sample_fh join("\n", /^sample\s+(\S+)/gm), "\n"; print $good_fh join("\n", /^good\s+(\S+)/gm ), "\n"; } close $input_fh; close $sample_fh; close $good_fh;

Parallel

use strict; use warnings; use MCE; open my $sample_fh, '>', 'sample.txt' or die "open error: $!"; open my $good_fh, '>', 'good.txt' or die "open error: $!"; # tybalt89's technique running parallel # see https://www.perlmonks.org/?node_id=1221387 MCE->new( chunk_size => '1m', max_workers => 4, use_slurpio => 1, input_data => 'test.txt', user_func => sub { my ( $mce, $slurp_ref, $chunk_id ) = @_; local $_ = ${ $slurp_ref }; MCE->print($sample_fh, join("\n", /^sample\s+(\S+)/gm), "\n"); MCE->print($good_fh, join("\n", /^good\s+(\S+)/gm ), "\n"); } )->run; close $sample_fh; close $good_fh;

Demo

$ time /opt/perl-5.26.1/bin/perl demo_serial.pl real 0m15.662s user 0m15.025s sys 0m0.607s $ time /opt/perl-5.26.1/bin/perl demo_parallel.pl real 0m4.042s user 0m15.617s sys 0m0.345s

Regards, Mario