http://www.perlmonks.org?node_id=893146

gizmo_mathboy has asked for the wisdom of the Perl Monks concerning the following question:

I'm try to speed up some data collection on a bunch of Windows pc's.

Basically I need to parse some info out of the the Event Log. I then munge some data from a bunch of files and then pass data back in an array of arrays.

My current programs takes about a minute per pc to run and I am testing about about 70pc's at the moment and could potentially need to run this on 1,000's.

Here's a stripped down version of the code.

my ($file_listing_pcs) = @ARGV; #slurp in file my $data_ref; while (<$file>) { my $pc = $_; if ( ping_it($pc) ) { ($data_ref) = process_pc($pc,$data_ref); } else { next; } } # post process data exit(0); sub process_pc { my ($pc, $data_ref) = @_; my ($event_log_data) = get_event_log_data($pc); my $dir_path = qq(//$pc/c\$/temp); ($data_ref) = process_dir($event_log_data,$dir_path,$data_ref) return($data_ref); } sub get_event_log_data { my ($pc) = @_; # play with Win32::EventLog return($event_log_data); } sub process_dir { my ($event_log_data,$dir_path,$data_ref) = @_; # read directory # get list of files to read # read files # plunder data # munge it about return($data_ref); }

After spending time googling, reading the docs and PerlMonks I just need to use Coro (I don't think I need AnyEvent but I'm not sure).

The biggest time suck is dealing with the Event Log.

My first naive attempt to use Coro is to just change the while loop that processes the list of pc's to be:

my @pids; while (<$file>) { my $pc = $_; if ( ping_it($pc) ) { push @pids, async{ ($data_ref) = process_pc($pc,$data_ref) }; cede; } else { next; } } $_->join for @pids; #data profit?

This didn't change the time it took to process my test horder at all.

I can't recall the last time I event pondering doing anything event related let alone thread related.

Any and all guidance much appreciated.

gizmo

Replies are listed 'Best First'.
Re: Use Coro for reading EventLog on many pc's?
by ikegami (Patriarch) on Mar 14, 2011 at 20:48 UTC

    Coro offers a cooperative multitasking solution, and your calls to process_pc aren't cooperating (so to speak). Get rid of the call to cede (which would be useless in a functioning Coro version anyway) and switch use Coro; to use threads;.

    Note that both Coro's and threads's async functions return an object (not some id).

      Thank you much. That helped speed things up.

      Now to re-read the docs for threads to figure out how to limit the number of threads I have so I don't saturate my network connection. I initially was looking at Thread:Queue but I wasn't understanding it properly. Maybe with this bit of helpful knowledge I can grok the docs.

      gizmo

        Thread::Queue is indeed the way to go.

        my $q = Thread::Queue->new(); my @threads; for (1..MAX_WORKERS) { push @threads, async { while (my $job = $q->dequeue()) { do_job($job); } }; } # Give work to the workers. for my $job (...) { $q->enqueue($job); } # Signal the workers to exit. $q->enqueue(undef) for 1..MAX_WORKERS; # Wait for the workers to finish. $_->join() for @threads;