Re: Longest Collatz using MCE::Flow, Inline::C, and GCC compiler intrinsics (Caching)

Hi again,

I tried caching using an array and File::Map. All run using one core and are reasonably fast. The main goal here are attempts made at reducing memory consumption.

Update 1: Added PDL example.
Update 2: Improved performance.
Update 3: Added scalar example.

use strict;
use warnings;
use feature 'say';
use File::Map qw/map_anonymous unmap/;
use PDL;

# based on the caching demonstration by iM71
# https://stackoverflow.com/a/55361008
# https://stackoverflow.com/questions/38114205/c-collatz-conjecture-op
+timization

sub collatz_longest_pdl {
    my ( $size ) = @_;
    my ( $length, $number ) = ( 0, 0 );
    my ( $n, $steps, $tmp, $n2 );

    my $cache = zeros( short(), $size + 1 );

    for my $current ( 2 .. $size ) {
        $n = $current, $steps = 0;

        # count using the T(x) notation
        $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
               : ( $steps += 1, $n = $n >> 1 )
            while $n != 1 && $n >= $current;

        $tmp = $steps + at( $cache, $n );
        set( $cache, $current, $tmp );

        $length = $tmp, $number = $current
           if $tmp > $length;
    }

    return ( $number, $length );
}

sub collatz_longest_array {
    my ( $size ) = @_;
    my ( $length, $number ) = ( 0, 0 );
    my ( $n, $steps, $tmp );

    my @cache;

    for my $current ( 2 .. $size ) {
        $n = $current, $steps = 0;

        # count using the T(x) notation
        $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
               : ( $steps += 1, $n = $n >> 1 )
            while $n != 1 && $n >= $current;

        $tmp = $steps + ( $cache[ $n ] // 0 );
        $cache[ $current ] = $tmp;

        $length = $tmp, $number = $current
            if $tmp > $length;
    }

    return ( $number, $length );
}

sub collatz_longest_filemap {
    my ( $size ) = @_;
    my ( $length, $number ) = ( 0, 0 );
    my ( $n, $steps, $tmp );

    map_anonymous my $cache, $size * 2 + 2, 'shared';

    # fill cache with zero's
    substr($cache, 0, $size * 2 + 2, pack('s', 0) x ( $size + 1 ));

    for my $current ( 2 .. $size ) {
        $n = $current, $steps = 0;

        # count using the T(x) notation
        $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
               : ( $steps += 1, $n = $n >> 1 )
            while $n != 1 && $n >= $current;

        $tmp = $steps + unpack('s', substr($cache, $n * 2, 2));
        substr($cache, $current * 2, 2, pack('s', $tmp));

        $length = $tmp, $number = $current
            if $tmp > $length;
    }

    unmap $cache;

    return ( $number, $length );
}

sub collatz_longest_scalar {
    my ( $size ) = @_;
    my ( $length, $number ) = ( 0, 0 );
    my ( $n, $steps, $tmp );

    # fill cache with zero's
    my $cache = pack('s', 0) x ( $size + 1 );

    for my $current ( 2 .. $size ) {
        $n = $current, $steps = 0;

        # count using the T(x) notation
        $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
               : ( $steps += 1, $n = $n >> 1 )
            while $n != 1 && $n >= $current;

        $tmp = $steps + unpack('s', substr($cache, $n * 2, 2));
        substr($cache, $current * 2, 2, pack('s', $tmp));

        $length = $tmp, $number = $current
            if $tmp > $length;
    }

    return ( $number, $length );
}

#*collatz = \&collatz_longest_pdl;      # choose collatz here
#*collatz = \&collatz_longest_array;
#*collatz = \&collatz_longest_filemap;
 *collatz = \&collatz_longest_scalar;

my ( $number, $length ) = collatz( shift || '1e7' );

say "Longest Collatz (index and value)";
say "$number $length";
[download]

Output

1e7 : 8400511 685

  1 core
    collatz_longest          1m16.034s  MCE Perl code
    collatz_longest_pdl      0m13.691s  Consumes 39 MiB
    collatz_longest_filemap  0m06.615s  Consumes 59 MiB
    collatz_longest_scalar   0m06.148s  Consumes 39 MiB
    collatz_longest_array    0m04.986s  Consumes 330 MiB
    collatz_longest_c1       0m01.868s  Inline C code
    collatz_longest_c2       0m00.778s  Inline C code
[download]

I've been wanting to try File::Map and pleasantly surprised.

Regards, Mario

Comment on Re: Longest Collatz using MCE::Flow, Inline::C, and GCC compiler intrinsics (Caching) Select or Download Code

Replies are listed 'Best First'.

Re^2: Longest Collatz using MCE::Flow, Inline::C, and GCC compiler intrinsics (Caching)
by marioroy (Prior) on Apr 19, 2020 at 06:58 UTC

Greetings,

I am pleased to provide the final work. Computing collatz_longest for 1e9 requires ~ 3.8 GiB of available memory. Try 5e8 for lesser memory consumption.

This is a parallel demonstration using MCE::Flow and File::Map for caching. The 2nd example uses Inline::C for counting the number of steps. Unlike the other examples, workers working on chunks 2 and higher require results from prior chunks. This is accounted for. Obviously the worker processing chunk 1 doesn't need prior results. Once the workers being processing (i.e. after the mapped cache creation), it doesn't take long before full CPU utilization kicks in.

Q. Why does this work, especially when subsequent chunks need results from prior chunks?

A. The magic lies with using a smaller chunk size set to 2500. The initial ramp up is a one time occurrence. One cool thing about MCE is that input IO is sequential. This applies to number sequences as well. A worker obtaining chunk 1 begins processing immediately (there's no wasting time). The worker obtaining the next chunk begins processing but may need to pause a little here and there. Eventually, the chunks as far as timing goes (starting) are spread out where workers need to pause left often. This can be seen by looking at the CPU utilization. The Power of Randomness kicks in at some point with CPU utilization near 100% until completion.

Credit to PerlMonks choroba, Laurent_R, 1nickt, rjt, and vr. See this thread. Worthy mention goes to Leon Timmermans, author of File::Map. Wow!

Credit for the caching technique used here is based on the caching demonstration by iM71, a response in that thread. My first attempt at parallelization failed. I tried again by maximizing on MCE's strengths described above. It's surreal :)

Credit for reducing the number of loop iterations was from watching Notation and compressed dynamics, one minute into the video (i.e. the T(x) notation).

Below, the minimum and maximum argument (size) is 1e6 and 1e9 respectively. The two scripts will set to limit quietly if exceeded.

Cache miss update:

Cache miss is less than 1%. Therefore, it is faster to compute for $n than waiting for the result.

Final update:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;

use File::Map qw/map_anonymous unmap/;
use Time::HiRes qw/usleep/;
use MCE::Flow;

my $size = shift || 1e6;

$size = 1e6 if $size < 1e6;  # minimum
$size = 1e9 if $size > 1e9;  # maximum

map_anonymous my $cache, $size * 2 + 2, 'shared';

# fill cache with zeroes
substr($cache, 0, $size * 2 + 2, pack('s', 0) x ( $size + 1 ));

# local to workers and the manager process
my ( $length, $number ) = ( 0, 0 );

sub collatz_longest {
    my ( $chunk_id, $seq_beg, $seq_end ) = @_;
    my ( $n, $steps, $tmp );

    for my $input ( $seq_beg .. $seq_end ) {
        $n = $input, $steps = 0;

        $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
               : ( $steps += 1, $n = $n >> 1 )
            while $n != 1 && $n >= $input;

        $tmp = unpack('s', substr($cache, $n * 2, 2));

        # another worker with a lesser chunk_id is not yet
        # completed processing $n, so pause a little

        # if ( $tmp == 0 && $chunk_id > 1 ) {
        #     do {
        #         usleep 100;
        #         $tmp = unpack('s', substr($cache, $n * 2, 2));
        #     } while ( $tmp == 0 );
        # }

        # do this instead (faster): compute $n if cache miss
        $tmp = _collatz($n) if $tmp == 0 && $chunk_id > 1;

        substr($cache, $input * 2, 2, pack('s', $steps += $tmp));
        $length = $steps, $number = $input if $steps > $length;
    }
}

sub _collatz {
    my ( $input ) = @_;
    my ( $n, $steps ) = ( $input, 0 );

    $n % 2 ? ( $steps += 2, $n = (3 * $n + 1) >> 1 )
           : ( $steps += 1, $n = $n >> 1 )
        while $n != 1 && $n >= $input;

    my $tmp = unpack('s', substr($cache, $n * 2, 2));

    $tmp = _collatz($n) if $tmp == 0;
    substr($cache, $input * 2, 2, pack('s', $steps += $tmp));

    return $steps
}

my $chunk_size;

$chunk_size  = int( $size / MCE::Util::get_ncpu() / 80 + 1 );
$chunk_size += 1 if $chunk_size % 2;

MCE::Flow->init(
    max_workers => MCE::Util::get_ncpu(),
    chunk_size  => $chunk_size,  # specify 2500 if pausing above
    bounds_only => 1,
    gather      => sub {
        $length = $_[0], $number = $_[1] if $_[0] > $length;
    },
    user_end    => sub {
        MCE->gather($length, $number);
    },
);

mce_flow_s sub {
    my ( $mce, $seq_ref, $chunk_id ) = @_;
    collatz_longest($chunk_id, @{ $seq_ref });
}, 2, $size;

MCE::Flow->finish; unmap $cache;

say "Longest Collatz (index and value)";
say "$number $length";
[download]

Count steps via Inline C:

#!/usr/bin/env perl
use strict;
use warnings;
use feature qw/say/;

use File::Map qw/map_anonymous unmap/;
use Time::HiRes qw/usleep/;
use MCE::Flow;

use Inline C => Config => CCFLAGSEX => '-O2 -fomit-frame-pointer';
use Inline C => <<'END_OF_C_CODE';

#include <stdint.h>

void num_steps_c( SV* _n, SV* _s )
{
    uint64_t n, input;
    int steps = 0;

    n = input = SvUV(_n);

    while ( n != 1 && n >= input ) {
        n % 2 ? ( steps += 2, n = (3 * n + 1) >> 1 )
              : ( steps += 1, n = n >> 1 );
    }

    sv_setuv(_n, n);
    sv_setiv(_s, steps);

    return;
}

END_OF_C_CODE

my $size = shift || 1e6;

$size = 1e6 if $size < 1e6;  # minimum
$size = 1e9 if $size > 1e9;  # maximum

map_anonymous my $cache, $size * 2 + 2, 'shared';

# fill cache with zeroes
substr($cache, 0, $size * 2 + 2, pack('s', 0) x ( $size + 1 ));

# local to workers and the manager process
my ( $length, $number ) = ( 0, 0 );

sub collatz_longest {
    my ( $chunk_id, $seq_beg, $seq_end ) = @_;
    my ( $n, $steps, $tmp );

    for my $input ( $seq_beg .. $seq_end ) {
        num_steps_c($n = $input, $steps);

        $tmp = unpack('s', substr($cache, $n * 2, 2));

        # another worker with a lesser chunk_id is not yet
        # completed processing $n, so pause a little

        # if ( $tmp == 0 && $chunk_id > 1 ) {
        #     do {
        #         usleep 100;
        #         $tmp = unpack('s', substr($cache, $n * 2, 2));
        #     } while ( $tmp == 0 );
        # }

        # do this instead (faster): compute $n if cache miss
        $tmp = _collatz($n) if $tmp == 0 && $chunk_id > 1;

        substr($cache, $input * 2, 2, pack('s', $steps += $tmp));
        $length = $steps, $number = $input if $steps > $length;
    }
}

sub _collatz {
    my ( $input ) = @_;

    num_steps_c( my $n = $input, my $steps );
    my $tmp = unpack('s', substr($cache, $n * 2, 2));

    $tmp = _collatz($n) if $tmp == 0;
    substr($cache, $input * 2, 2, pack('s', $steps += $tmp));

    return $steps
}

my $chunk_size;

$chunk_size  = int( $size / MCE::Util::get_ncpu() / 80 + 1 );
$chunk_size += 1 if $chunk_size % 2;

MCE::Flow->init(
    max_workers => MCE::Util::get_ncpu(),
    chunk_size  => $chunk_size,  # specify 2500 if pausing above
    bounds_only => 1,
    gather      => sub {
        $length = $_[0], $number = $_[1] if $_[0] > $length;
    },
    user_end    => sub {
        MCE->gather($length, $number);
    },
);

mce_flow_s sub {
    my ( $mce, $seq_ref, $chunk_id ) = @_;
    collatz_longest($chunk_id, @{ $seq_ref });
}, 2, $size;

MCE::Flow->finish; unmap $cache;

say "Longest Collatz (index and value)";
say "$number $length";
[download]

Results: Unix time (i.e. time perl script.pl 1e9).

1e9, 32 cores
  collatz_longest_final      0m26.371s
  collatz_longest_inline_c   0m15.634s

Longest Collatz (index and value)
670617279 986
[download]

Regards, Mario

[reply]
[d/l]
[select]


"be consistent"
	PerlMonks