Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: About text file parsing -- MCE

by Discipulus (Abbot)
on Aug 29, 2018 at 07:27 UTC ( #1221293=note: print w/replies, xml ) Need Help??

in reply to About text file parsing

Hello dideod.yang,

if your file is huge a line by line processsing will result slow with any variation of the algorithm. But you can throw more CPUs at this with, hopefully, better results. While parallel programming is not so easy to implement correctly in Perl, a gentle monk, marioroy, spent a lot of time and energy to help us, producing MCE and it seems that the second example of the documentation can be easely modified to suit your needs.

The example uses MCE::Loop to work on a file in chunks: pay attention to OS dependant implementation inside the mce_loop_f call below and choose the appropriate one for your OS

# from MCE docs: use MCE::Loop; MCE::Loop::init { max_workers => 8, use_slurpio => 1 }; my $pattern = 'something'; my $hugefile = 'very_huge.file'; my @result = mce_loop_f { my ($mce, $slurp_ref, $chunk_id) = @_; # Quickly determine if a match is found. # Process the slurped chunk only if true. if ($$slurp_ref =~ /$pattern/m) { my @matches; # The following is fast on Unix, but performance degrades # drastically on Windows beyond 4 workers. open my $MEM_FH, '<', $slurp_ref; binmode $MEM_FH, ':raw'; while (<$MEM_FH>) { push @matches, $_ if (/$pattern/); } close $MEM_FH; # Therefore, use the following construction on Windows. while ( $$slurp_ref =~ /([^\n]+\n)/mg ) { my $line = $1; # save $1 to not lose the value push @matches, $line if ($line =~ /$pattern/); } # Gather matched lines. MCE->gather(@matches); } } $hugefile; print join('', @result);


UPDATE you can also be interested in some other tecniques you can find in my library

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1221293]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2020-12-02 13:56 GMT
Find Nodes?
    Voting Booth?
    How often do you use taint mode?

    Results (41 votes). Check out past polls.