Hi vr,
Wow! Curiosity moved me to try with MCE. But first, I optimized Nick's code. Also, workers now send only the relevant data and not the entire list. This decreases the time needed for the manager process to output results.
Update 1: Slight optimization to Choroba's example (i.e. >> 1). Added collatz_compress.
Update 2: Added collatz_count.
Update 3: Improved collatz_count.
Update 4: Added Inline C times.
Demo choroba, 1nickt, compress and count all in one.
use strict;
use warnings;
use feature 'say';
use MCE::Flow;
sub collatz_choroba {
my ($n) = @_;
my @seq = $n;
push @seq, ( $seq[-1] >> 1, 3 * $seq[-1] + 1 )[ $seq[-1] % 2 ]
while $seq[-1] != 1;
return scalar @seq;
}
sub collatz_1nickt {
my ($n) = @_;
my @seq = $n;
push @seq, $n = $n % 2 ? 3 * $n + 1 : $n >> 1
while $n != 1;
return scalar @seq;
}
sub collatz_compress {
my ($n) = @_;
my @seq = $n;
# Notation and compressed dynamics (2:15 into the video)
# https://www.youtube.com/watch?v=t1I9uHF9X5Y
# consider the map
# C(x) = $n % 2 ? 3 * $n + 1 : $n / 2
# compress the map to the dynamically equivalent
# T(x) = $n % 2 ? (3 * $n + 1) / 2 : $n / 2
# compress loop iteration when odd sequence
push @seq, $n % 2 ? ( $n = 3 * $n + 1, $n = $n >> 1 )
: ( $n = $n >> 1 )
while $n != 1;
return scalar @seq;
}
sub collatz_count {
my ($n) = @_;
my $steps = 1;
# count using the T(x) notation
$n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
: ( $steps += 1, $n = $n >> 1 )
while $n != 1;
return $steps;
}
#*collatz = \&collatz_choroba; # choose collatz here
#*collatz = \&collatz_1nickt;
#*collatz = \&collatz_compress;
*collatz = \&collatz_count;
my @sizes;
MCE::Flow->init(
max_workers => MCE::Util::get_ncpu(),
gather => \@sizes,
bounds_only => 1,
);
mce_flow_s sub {
my ($start, $end) = @{ $_[1] };
my @ret = map { [ $_, collatz($_) ] } $start .. $end;
MCE->gather( ( sort { $b->[1] <=> $a->[1] } @ret )[ 0..19 ] );
}, 1, 1e6;
MCE::Flow->finish;
say "@$_"
for ( sort { $b->[1] <=> $a->[1] } @sizes )[ 0..19 ];
Demo caching by vr
use strict;
use warnings;
use feature 'say';
use MCE::Flow;
my %cache = ( 1 => [ 1, 1, undef ]);
#
# [ seq_length, start_num (same as key), next_num ]
#
sub collatz_vr {
my $n = shift;
return if exists $cache{ $n };
my ( @tmp, $new_n );
while ( !exists $cache{ $n } ) {
$new_n = $n % 2 ? 3 * $n + 1 : $n >> 1;
push @tmp, $cache{ $n } = [ -$#tmp, $n, $new_n ];
$n = $new_n
}
my $d = $#tmp + $cache{ $n }[ 0 ];
$_->[ 0 ] += $d for @tmp;
}
my @sizes;
mce_flow_s {
max_workers => MCE::Util::get_ncpu(),
gather => \@sizes,
bounds_only => 1,
}, sub {
my ($start, $end) = @{ $_[1] };
collatz_vr($_) for $start .. $end;
MCE->gather(
( sort { $b->[0] <=> $a->[0] } @cache{ $start .. $end } )[ 0 .
+. 19 ]
);
}, 1, 1e6;
MCE::Flow->finish;
say "@$_[ 1, 0 ]"
for ( sort { $b->[0] <=> $a->[0] } @sizes )[ 0 .. 19 ];
Run time on a 32 core machine - SMT disabled
The times include launching Perl, loading modules, spawning workers, reaping workers, and output (~ 0.100 seconds). See the next post for the Inline C demonstration.
max_workers => 1
collatz_vr 3.617s 1.0x
collatz_choroba 14.896s 1.0x
collatz_1nickt 12.264s 1.0x
collatz_compress 8.932s 1.0x
collatz_count 7.021s 1.0x
collatz_count_c1 0.741s 1.0x Inline C
collatz_count_c2 0.625s 1.0x Inline C
max_workers => 2
collatz_vr 2.707s 1.3x
collatz_choroba 7.538s 2.0x
collatz_1nickt 6.141s 2.0x
collatz_compress 4.490s 2.0x
collatz_count 3.537s 2.0x
collatz_count_c1 0.398s 1.9x Inline C
collatz_count_c2 0.339s 1.8x Inline C
max_workers => 4
collatz_vr 1.906s 1.9x
collatz_choroba 3.948s 3.8x
collatz_1nickt 3.274s 3.7x
collatz_compress 2.365s 3.8x
collatz_count 1.873s 3.7x
collatz_count_c1 0.235s 3.2x Inline C
collatz_count_c2 0.203s 3.1x Inline C
max_workers => 8
collatz_vr 1.458s 2.5x
collatz_choroba 1.995s 7.5x
collatz_1nickt 1.655s 7.4x
collatz_compress 1.209s 7.4x
collatz_count 0.961s 7.3x
collatz_count_c1 0.149s 5.0x Inline C
collatz_count_c2 0.132s 4.7x Inline C
max_workers => 16
collatz_vr 0.976s 3.7x
collatz_choroba 1.034s 14.4x
collatz_1nickt 0.858s 14.3x
collatz_compress 0.637s 14.0x
collatz_count 0.509s 13.8x
collatz_count_c1 0.114s 6.5x Inline C
collatz_count_c2 0.105s 6.0x Inline C
max_workers => 32
collatz_vr 0.837s 4.3x
collatz_choroba 0.593s 25.1x
collatz_1nickt 0.497s 24.7x
collatz_compress 0.379s 23.6x
collatz_count 0.310s 22.6x
collatz_count_c1 0.112s 6.6x Inline C
collatz_count_c2 0.108s 5.8x Inline C
Interesting. For vr's demo, every worker starts with an empty cache. Meaning that workers do not have cached results from prior chunks. This is the reason not scaling versus the non-cache demonstrations.
collatz_count (2 cores) reaches collatz_vr (1 core).
collatz_count (4 cores) reaches collatz_vr (4 cores).
Kind regards, Mario