Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: About text file parsing

by marioroy (Priest)
on Aug 30, 2018 at 20:52 UTC ( #1221395=note: print w/replies, xml ) Need Help??


in reply to Re: About text file parsing
in thread About text file parsing

That's cool, tybalt89. Each day, learn something new about Perl.

I ran serially and parallel with "text.txt" containing 50 million lines. There is no slowness using Perl v5.20 and higher.

Serial

use strict; use warnings; open my $input_fh, '<', 'test.txt' or die "open error: $!"; open my $sample_fh, '>', 'sample.txt' or die "open error: $!"; open my $good_fh, '>', 'good.txt' or die "open error: $!"; # tybalt89's technique running serially # see https://www.perlmonks.org/?node_id=1221387 local $/ = \2e6; # or bigger chunk depending on your memory size while (<$input_fh>) { # read big chunk $_ .= do { local $/ = "\n"; <$input_fh> // ''}; # read any partial + line print $sample_fh join("\n", /^sample\s+(\S+)/gm), "\n"; print $good_fh join("\n", /^good\s+(\S+)/gm ), "\n"; } close $input_fh; close $sample_fh; close $good_fh;

Parallel

use strict; use warnings; use MCE; open my $sample_fh, '>', 'sample.txt' or die "open error: $!"; open my $good_fh, '>', 'good.txt' or die "open error: $!"; # tybalt89's technique running parallel # see https://www.perlmonks.org/?node_id=1221387 MCE->new( chunk_size => '1m', max_workers => 4, use_slurpio => 1, input_data => 'test.txt', user_func => sub { my ( $mce, $slurp_ref, $chunk_id ) = @_; local $_ = ${ $slurp_ref }; MCE->print($sample_fh, join("\n", /^sample\s+(\S+)/gm), "\n"); MCE->print($good_fh, join("\n", /^good\s+(\S+)/gm ), "\n"); } )->run; close $sample_fh; close $good_fh;

Demo

$ time /opt/perl-5.26.1/bin/perl demo_serial.pl real 0m15.662s user 0m15.025s sys 0m0.607s $ time /opt/perl-5.26.1/bin/perl demo_parallel.pl real 0m4.042s user 0m15.617s sys 0m0.345s

Regards, Mario

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1221395]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2019-06-20 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Is there a future for codeless software?



    Results (89 votes). Check out past polls.

    Notices?