Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I'm struggling mightily to add parallel processing to an otherwise working script using Parallel::ForkManager. Here's the working script with the forking code commented out:

#!perl # # CountFilesRecords.pl use strict; use warnings; use Capture::Tiny qw( capture_stdout ); use English qw( -no_match_vars ); use File::Glob qw( bsd_glob ); #use Parallel::ForkManager; use Text::CSV_XS; @ARGV or die "Usage: perl $PROGRAM_NAME <export volume folder> ...\n"; # Expand globs... local @ARGV = map { $ARG =~ tr{\\}{/}; bsd_glob($ARG) } @ARGV; local $OUTPUT_RECORD_SEPARATOR = "\n"; local $OUTPUT_AUTOFLUSH = 1; #my $MAXIMUM_BATCH_SIZE = 4; my @CSV_FIELD_LABELS = qw( ExportVolumeFolder TotalDATRecords TotalTextFiles TotalLFPRecords TotalImageFiles ); for my $volume_folder (@ARGV) { -d $volume_folder or die "Export volume folder $volume_folder doesn't exist\n"; } my @volume_folders; my %stuff_by; VOLUME_FOLDER: for my $volume_folder (@ARGV) { my $volume_name = (split m{/}, $volume_folder)[-1]; my $text_folder = "$volume_folder/TEXT"; my $images_folder = "$volume_folder/IMAGES"; my $dat_file = "$volume_folder/$volume_name.dat"; my $lfp_file = "$volume_folder/$volume_name.lfp"; # Check for completed export volumes, report incomplete ones... unless (-d $text_folder && -d $images_folder && -f $dat_file && -f + $lfp_file) { select STDERR; print $volume_folder; select STDOUT; next VOLUME_FOLDER; } push @volume_folders, $volume_folder; $stuff_by{$volume_folder} = { FOLDER_NAME => $volume_folder, TEXT_FILES => { COMMAND => qq( find "$text_folder" -type f -name "*.txt" | + wc -l ), COUNT => 0, }, IMAGE_FILES => { COMMAND => qq( find "$images_folder" -type f ! -name Thumb +s.db | wc -l ), COUNT => 0, }, DAT_RECORDS => { COMMAND => qq( wc -l "$dat_file" ), COUNT => 0, }, LFP_RECORDS => { COMMAND => qq( wc -l "$lfp_file" ), COUNT => 0, }, }; } # Quit if there are no completed export volume folders... exit 1 unless @volume_folders; my $csv = Text::CSV_XS->new(); # Print CSV header... $csv->print(\*STDOUT, \@CSV_FIELD_LABELS); #my $manager = Parallel::ForkManager->new($MAXIMUM_BATCH_SIZE); VOLUME_PROBE: for my $volume_folder (@volume_folders) { #$manager->start() and next VOLUME_PROBE; probe_volume($stuff_by{$volume_folder}); #$manager->finish(); } #$manager->wait_all_children(); exit 0; sub probe_volume { my $vol = shift; for my $stuff (qw( TEXT_FILES IMAGE_FILES DAT_RECORDS LFP_RECORDS +)) { (undef, $vol->{$stuff}{COUNT}) = capture_stdout { count_stuff($vol->{$stuff}{COMMAND}) }; } # The first line of every DAT file is a header $vol->{DAT_RECORDS}{COUNT}--; my @results = ( $vol->{FOLDER_NAME}, $vol->{DAT_RECORDS}{COUNT}, $vol->{TEXT_FILES}{COUNT}, $vol->{LFP_RECORDS}{COUNT}, $vol->{IMAGE_FILES}{COUNT} ); # Print CSV record... $csv->print(\*STDOUT, \@results); return; } sub count_stuff { my $command = shift; my $output = qx( $command ); my ($count) = $output =~ m/(\d+)/; return $count; }

I'm hoping some kind PerlMonk with experience using Parallel::ForkManager on Windows can spot the problem at a glance.

Thanks in advance for your gracious help.

UPDATE:  OK, I'm not wedded to Parallel::ForkManager. Is there a better way to manage parallel external processes (i.e., system calls to find and wc, capturing their standard output streams) without suffering this problem? Or is there a simple way to resolve the STDOUT problem using Parallel::ForkManager? I don't want to have to rewrite the whole script, which otherwise works fine, and I don't want to abandon using Capture::Tiny.


In reply to Adding parallel processing to a working Perl script by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found