comment on

Hi vr,

Lots of PDL goodness. Wow! I do not know how this will perform unless taking a moment or two and give it a try. So here is a parallel version based on your 2nd example.

use strict;
use warnings;
use feature 'say';
use PDL;
use MCE::Flow;
use MCE::Candy;

use Time::HiRes 'time';
my $t = time;

# PDL Quick Reference
# https://www.perlmonks.org/?node_id=1214437

sub collatz_seq {
    my ( $chunk_id, $seq_beg, $seq_end ) = @_;
    my $max = $seq_end - $seq_beg + 2;
    my $top = $max < 20 ? $max : 20;

    my $seqs = pdl( longlong, 1, $seq_beg..$seq_end );
    $seqs-> setbadat( 0 );
    $seqs-> badvalue( 2 );

    my $lengths = 1 + ones( short, $max );
    $lengths-> set( 0, 1 );

    while ( any my $good_mask = $seqs-> isgood ) {

        my ( $seqs_odd, $lengths_odd_masked )
            = where( $seqs, $lengths, $seqs & 1 );

        $lengths_odd_masked ++;
        $lengths-> where( $good_mask ) ++;
        ( $seqs_odd *= 3 ) ++;
        $seqs >>= 1;
    }

    my $sorted_i = $lengths-> qsorti;
    my $sorted   = $lengths-> index( $sorted_i );
    my $value    = $sorted-> at( $max - $top );
    my $pos      = vsearch_insert_leftmost( $value, $sorted );
    my $top_i    = $sorted_i-> slice([ $max - 1 , $pos ]);

    ( my $result = $lengths
        -> index( $top_i )
        -> longlong
        -> bitnot
        -> cat( $top_i + 1 )
        -> transpose
        -> qsortvec
        -> slice([], [ 0, $top - 1 ])

    )-> slice([ 0 ], [])
     -> inplace
     -> bitnot;

    # From PDL to Perl: [ 0 1 ] becomes [ 1, 0 ],
    my $str = $result->string;
    $str =~ s/(\d+)\s+(\d+)(.*)/$2,$1$3,/g;

    my $ret = eval $str;
    $_->[0] = $_->[0] + $seq_beg - 2 for @$ret;

    MCE->gather( $chunk_id, @$ret );
}

my $size = shift || 1e6;
$size = 1e6 if $size < 1e6;  # minimum
$size = 1e9 if $size > 1e9;  # maximum

my $chunk_size = $size >> 5;
my @seqs;

mce_flow_s {
    max_workers => MCE::Util::get_ncpu(),
    chunk_size  => $chunk_size > 100000 ? 100000 : $chunk_size,
    bounds_only => 1,
    gather      => MCE::Candy::out_iter_array(\@seqs),
}, sub {
    my ( $mce, $chunk_ref, $chunk_id ) = @_;
    collatz_seq( $chunk_id, @{ $chunk_ref } );
}, 2, $size;

MCE::Flow->finish;

@seqs = ( sort { $b->[1] <=> $a->[1]} @seqs )[ 0..19 ];

printf "Collatz(%5d) has sequence length of %3d steps\n", @$_
    for @seqs;

say {*STDERR} time - $t;
[download]

Results:

I was expecting for mce_pdl2 using 1 core to be closer to vr_pdl2 than vr_pdl3. Maybe benefitting from CPU L2/L3 cache. Chunking seems to be the reason why mce_pdl2 (non-caching) ran nearly as fast as vr_pdl3 (caching). Below, includes 1 core testing for 1e7 with various chunk sizes.

1e7 testing
     vr_pdl3:    46.328s  cache
     vr_pdl2:  1m16.058s  non-cache
    mce_pdl2:  1m15.583s  chunk_size => $size
    mce_pdl2:  1m03.709s  chunk_size => $size >> 1
    mce_pdl2:    52.352s  chunk_size => $size >> 2
    mce_pdl2:    51.323s  chunk_size => $size >> 3
    mce_pdl2:    49.396s  chunk_size => $size >> 4
    mce_pdl2:    48.195s  chunk_size => $size >> 5
    mce_pdl2:    48.369s  chunk_size => $size >> 6
    mce_pdl2:    48.501s  chunk_size => $size >> 7

chunk_size => 300000
    mce_pdl2:    48.195s   1 core
    mce_pdl2:    25.311s   2 cores
    mce_pdl2:    14.085s   4 cores
    mce_pdl2:     7.650s   8 cores
    mce_pdl2:     4.517s  16 cores
    mce_pdl2:     3.721s  32 cores

chunk_size => 100000
    mce_pdl2:    48.395s   1 core
    mce_pdl2:    25.402s   2 cores
    mce_pdl2:    13.163s   4 cores
    mce_pdl2:     6.860s   8 cores
    mce_pdl2:     3.850s  16 cores
    mce_pdl2:     2.347s  32 cores

Output
    Collatz(8400511) has sequence length of 686 steps
    Collatz(8865705) has sequence length of 668 steps
    Collatz(6649279) has sequence length of 665 steps
    Collatz(9973919) has sequence length of 663 steps
    Collatz(6674175) has sequence length of 621 steps
    Collatz(7332399) has sequence length of 616 steps
    Collatz(7532665) has sequence length of 616 steps
    Collatz(5649499) has sequence length of 613 steps
    Collatz(8474249) has sequence length of 611 steps
    Collatz(6355687) has sequence length of 608 steps
    Collatz(8847225) has sequence length of 606 steps
    Collatz(9533531) has sequence length of 606 steps
    Collatz(6635419) has sequence length of 603 steps
    Collatz(9953129) has sequence length of 601 steps
    Collatz(7464846) has sequence length of 598 steps
    Collatz(7464847) has sequence length of 598 steps
    Collatz(3732423) has sequence length of 597 steps
    Collatz(5598635) has sequence length of 595 steps
    Collatz(8397953) has sequence length of 593 steps
    Collatz(6298465) has sequence length of 590 steps
[download]

1e8 testing
     vr_pdl3:  9m44.667s  cache
     vr_pdl2: 16m12.467s  non-cache

chunk_size => 300000
    mce_pdl2:  9m06.078s   1 core
    mce_pdl2:  4m43.529s   2 cores
    mce_pdl2:  2m33.136s   4 cores
    mce_pdl2:  1m21.434s   8 cores
    mce_pdl2:    45.266s  16 cores
    mce_pdl2:    36.925s  32 cores

chunk_size => 100000
    mce_pdl2:  9m09.950s   1 core
    mce_pdl2:  4m39.677s   2 cores
    mce_pdl2:  2m24.230s   4 cores
    mce_pdl2:  1m13.353s   8 cores
    mce_pdl2:    37.923s  16 cores
    mce_pdl2:    20.099s  32 cores

Output
    Collatz(63728127) has sequence length of 950 steps
    Collatz(95592191) has sequence length of 948 steps
    Collatz(96883183) has sequence length of 811 steps
    Collatz(86010015) has sequence length of 798 steps
    Collatz(98110761) has sequence length of 749 steps
    Collatz(73583070) has sequence length of 746 steps
    Collatz(73583071) has sequence length of 746 steps
    Collatz(36791535) has sequence length of 745 steps
    Collatz(55187303) has sequence length of 743 steps
    Collatz(56924955) has sequence length of 743 steps
    Collatz(82780955) has sequence length of 741 steps
    Collatz(85387433) has sequence length of 741 steps
    Collatz(63101607) has sequence length of 738 steps
    Collatz(64040575) has sequence length of 738 steps
    Collatz(93128574) has sequence length of 736 steps
    Collatz(93128575) has sequence length of 736 steps
    Collatz(94652411) has sequence length of 736 steps
    Collatz(96060863) has sequence length of 736 steps
    Collatz(46564287) has sequence length of 735 steps
    Collatz(69846431) has sequence length of 733 steps
[download]

Specifying 50000 for chunk size may run faster on 4/6/8 core machines.

8 cores
    mce_pdl2 1e8  16.660s  chunk_size => 300000
    mce_pdl2 1e8  10.471s  chunk_size => 100000
    mce_pdl2 1e8   9.773s  chunk_size =>  50000
[download]

Regards, Mario

In reply to Re^2: Optimizing with Caching vs. Parallelizing (MCE::Map) (PDL fun) by marioroy
in thread Optimizing with Caching vs. Parallelizing (MCE::Map) by 1nickt

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks