http://www.perlmonks.org?node_id=1138699


in reply to Optimizing perl code performance

Update: My next attempt compares running with threads and non-threads on the Mac and Linux. There is something strange about strftime that causes the script to slow down either with threads or non-threads depending on the OS.

Update: The serial code runs faster on a Linux VM. For some reason, the strftime function degrades in performance when running with many workers (even threads on Linux). I'm not sure why.

In my testing, strftime performs poorly when many workers call it simultaneously. This is fine with threads, but must limit the number of workers.

On my laptop (running Mac OS X), the serial code completes in 19.131 seconds for a 500 MB file and MCE completing in 6.569 seconds. Most of that time is coming from strftime. I verified this by replacing $A = strftime with $A = $Y which completes in 1.842 seconds.

#!/usr/bin/perl use strict; use warnings; use threads; use threads::shared; use POSIX qw(strftime); use MCE::Loop; use MCE::Candy; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open(DATAOUT, ">", $outfile); ## Workers process chunks in parallel until completed. ## Output order is preserved via MCE::Candy::out_iter_fh MCE::Loop::init { chunk_size => "2m", max_workers => 4, use_slurpio => 1, gather => MCE::Candy::out_iter_fh(\*DATAOUT), use_threads => 1 }; mce_loop_f { my ($mce, $chunkRef, $chunkID) = @_; my ($output, @Fields, $X, $Y, $A, $B, $C, $D) = (""); open my $CHUNKIN, "<", $chunkRef; while( my $line = <$CHUNKIN> ) { chomp $line; @Fields = split(',', $line, 9); $X = $Fields[8]; $Y = substr $X, 0, 10; $A = strftime "%M,%Y,%m,%d,%H,%j,%W,%u,%A", gmtime $Y; $B = substr($A, 0, index($A, ',')); $C = int($B/5); $D = int($B/15); $output .= $line.",$Y,$A,$C,$D\n"; } close $CHUNKIN; MCE->gather($chunkID, $output); } $infile; close(DATAOUT);

Kind regareds, Mario.