Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Using Parallel::ForkManager on multiple files using backtick operators for multiple files being processed simulnaneously

by symgryph (Sexton)
on Feb 19, 2020 at 18:30 UTC ( #11113169=perlquestion: print w/replies, xml ) Need Help??

symgryph has asked for the wisdom of the Perl Monks concerning the following question:

I have some code that essentailly runs a bash script sequentially, and was trying to multiprocess the managed children of the program. Aka, I use perl to run program x on y # of files. I use it as an orchestrator. When I try to run on multiple files, I get two running processes with the same filename, instead of two processes running on two different filenames. I am not sure on how to make my code multi-process aware, and need some help. Here is my code.
#!/usr/bin/env perl -w use Parallel::ForkManager; my $filename = 'all.txt'; my $failuresfilename="failed.tsv"; open (my $target, "<", $filename) or die "Cannot open < $filename: $!" +; open (my $failures, ">", $failuresfilename) or die "Cannot open > $fai +luresfilename: $!"; sub readinFile { @lines = <$target>; } sub execute { $multiprocess = Parallel::ForkManager->new(2); TARGETS: foreach $processme (@lines) { $multiprocess->start and next TARGETS; chomp $processme; $command="cfn_nag_scan -o json --input-path $processme > $processm +e_.cfnag.json"; `$command`; $multiprocess->finish; } } sub findFailures { @files=`find ./ -iname "*cfnag*"`; $jqcommand='jq --raw-output \'.[] | select (.file_results.failure_co +unt > 0) |[.filename, .file_results.failure_count] |@tsv\''; foreach (@files) { chomp; s/\/\//\//g; @a=`cat $_ |$jqcommand`; print $failures @a; } } readinFile(); execute(); #findFailures(); close $failures; close $failuresfilename;
The subroutine in question is 'execute'. Any help would be appreciated. My input is a bunch of filenames that come from the 'find' command (in this case things I want to scan with cfn_nag). The system sub-executes cfn_nag_scan from the filenames array, which in turn system's the cfn_nag which outputs a bunch of 'scan' result files. Perl is more of a dispatcher than a processor of data.
"Two Wheels good, Four wheels bad."
  • Comment on Using Parallel::ForkManager on multiple files using backtick operators for multiple files being processed simulnaneously
  • Download Code

Replies are listed 'Best First'.
Re: Using Parallel::ForkManager on multiple files using backtick operators for multiple files being processed simulnaneously
by 1nickt (Abbot) on Feb 19, 2020 at 19:18 UTC

    Hi, you didn't supply sample input or output, or describe what the external program does, so the following is untested, but here's something to try using MCE.

    #!/usr/bin/env perl -w use strict; use warnings; use MCE::Loop; use Capture::Tiny 'capture_stdout'; use Text::CSV 'csv'; use Try::Tiny; use JSON; MCE::Loop::init { max_workers => 8, use_slurpio => 1 }; my $data_filename = 'all.txt'; my $fail_filename = 'failed.tsv'; my $cmd = 'cfn_nag_scan'; my @args = qw/ -o json --input-path /; my @results = mce_loop_f { my ($mce, $slurp_ref, $chunk_id) = @_; my @failures; while ( $$slurp_ref =~ /([^\n]+\n)/mg ) { my $line = $1; my $json = try { return decode_json(capture_stdout { system($cmd, @args, $ +line) }); } catch { warn "processing $line failed: $_"; return; }; next unless $json; my $failure_count = $json->{file_results}{failure_count}; push @failures, [ $json->{filename} => $failure_count ] if $fa +ilure_count; } # Gather results MCE->gather(@failures); } $data_filename; unshift @results, [qw/ Filename Failures /]; csv (in => \@results, out => $fail_filename, sep_char=> "\t"); __END__

    update: added json decoding and exception handling; changed return to next (thx marioroy)

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Using Parallel::ForkManager on multiple files using backtick operators for multiple files being processed simulnaneously
by jo37 (Scribe) on Feb 19, 2020 at 19:25 UTC

    Always use strict;. This would tell you that $processme_ is not declared. So at least your output file is the same for all processes named ".cfnag.json".

    -jo

      The simpler solution was to just cat the filenames to a file, and then process with gnu parallel.
      "Two Wheels good, Four wheels bad."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11113169]
Approved by johngg
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2020-04-03 17:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (30 votes). Check out past polls.

    Notices?