Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Optimizing perl code performance

by marioroy (Parson)
on Aug 15, 2015 at 14:48 UTC ( #1138699=note: print w/replies, xml ) Need Help??


in reply to Optimizing perl code performance

Update: My next attempt compares running with threads and non-threads on the Mac and Linux. There is something strange about strftime that causes the script to slow down either with threads or non-threads depending on the OS.

Update: The serial code runs faster on a Linux VM. For some reason, the strftime function degrades in performance when running with many workers (even threads on Linux). I'm not sure why.

In my testing, strftime performs poorly when many workers call it simultaneously. This is fine with threads, but must limit the number of workers.

On my laptop (running Mac OS X), the serial code completes in 19.131 seconds for a 500 MB file and MCE completing in 6.569 seconds. Most of that time is coming from strftime. I verified this by replacing $A = strftime with $A = $Y which completes in 1.842 seconds.

#!/usr/bin/perl use strict; use warnings; use threads; use threads::shared; use POSIX qw(strftime); use MCE::Loop; use MCE::Candy; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open(DATAOUT, ">", $outfile); ## Workers process chunks in parallel until completed. ## Output order is preserved via MCE::Candy::out_iter_fh MCE::Loop::init { chunk_size => "2m", max_workers => 4, use_slurpio => 1, gather => MCE::Candy::out_iter_fh(\*DATAOUT), use_threads => 1 }; mce_loop_f { my ($mce, $chunkRef, $chunkID) = @_; my ($output, @Fields, $X, $Y, $A, $B, $C, $D) = (""); open my $CHUNKIN, "<", $chunkRef; while( my $line = <$CHUNKIN> ) { chomp $line; @Fields = split(',', $line, 9); $X = $Fields[8]; $Y = substr $X, 0, 10; $A = strftime "%M,%Y,%m,%d,%H,%j,%W,%u,%A", gmtime $Y; $B = substr($A, 0, index($A, ',')); $C = int($B/5); $D = int($B/15); $output .= $line.",$Y,$A,$C,$D\n"; } close $CHUNKIN; MCE->gather($chunkID, $output); } $infile; close(DATAOUT);

Kind regareds, Mario.

Replies are listed 'Best First'.
Re^2: Optimizing perl code performance
by marioroy (Parson) on Aug 15, 2015 at 15:40 UTC

    Update: The disparity is coming from strftime.

    Update: One must use threads on the Mac and non-threads on Linux for best performance. This is mind-boggling to me. Replacing the strftime line with $A = $Y completes in a couple seconds for threads or non-threads on the Mac and Linux.

    The same 500 MB input file is used by both OS.

    Mac OS X Serial: 18.185s Mac OS X Parallel: 6.687s threads Mac OS X Parallel: 42.526s non-threads CentOS 7 VM Serial: 10.832s CentOS 7 VM Parallel: 23.849s threads CentOS 7 VM Parallel: 2.993s non-threads
    #!/usr/bin/perl use strict; use warnings; use threads; # Comment out threads for child processes use POSIX qw(strftime); use MCE::Loop; use MCE::Candy; my $mutex :shared = 0; my $infile = $ARGV[0]; my $outfile = $ARGV[1]; open(DATAOUT, ">", $outfile); ## Workers process chunks in parallel until completed. ## Output order is preserved via MCE::Candy::out_iter_fh MCE::Loop::init { chunk_size => "2m", max_workers => 4, use_slurpio => 1, gather => MCE::Candy::out_iter_fh(\*DATAOUT) }; mce_loop_f { my ($mce, $chunkRef, $chunkID) = @_; my ($output, @Fields, $X, $Y, $A, $B, $C, $D, @G) = (""); open my $CHUNKIN, "<", $chunkRef; while( my $line = <$CHUNKIN> ) { chomp $line; @Fields = split(',', $line, 9); $X = $Fields[8]; $Y = substr $X, 0, 10; @G = gmtime $Y; $A = strftime "%M,%Y,%m,%d,%H,%j,%W,%u,%A", @G; $B = substr($A, 0, index($A, ',')); $C = int($B/5); $D = int($B/15); $output .= $line.",$Y,$A,$C,$D\n"; } close $CHUNKIN; MCE->gather($chunkID, $output); } $infile; close(DATAOUT);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1138699]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2020-11-25 16:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?