http://www.perlmonks.org?node_id=1197347


in reply to Re^9: High Performance Game of Life
in thread High Performance Game of Life

Hi eyepopslikeamosquito. Yes, I've been running tgol2.t found here and it's been running fine. I can comfirm that bit-manipulation is failing with the new tgot2.t. The initial test script did not test for negative $y. Thus, assumed that $y was always positive. The bit-manipulation code will no longer work.

Okay, will comment readers to your post and strike out the bit-manipulation sections. Thank you for posting Extra Test Program tgol.t.

Update: For closure, I tested mapping supporting negative $x and $y. Pack('i2') is faster unless running cperl.

use strict; use warnings; use Time::HiRes qw(time); my ( $x , $y , $iters ) = ( -890394, 100, 5_000_000 ); my ( $xx, $yy, $n ); ## # sub _pack { # my ( $x, $y ) = @_; # return # $x < 0 ? -( abs($x) << 16 | $y ) : $x << 16 | $y; # } # # sub _unpack { # my ( $n ) = @_; # return $n < 0 # ? ( -( abs($n) >> 16 ), abs($n) & 0xFFFF ) # : ( $n >> 16 , $n & 0xFFFF ); # } ## bench( "bitops ", sub { # map two integers $x and $y into $n # support negative $x only for ( 1 .. $iters ) { $n = $x < 0 ? -( abs($x) << 16 | $y ) : $x << 16 | $y; ( $xx, $yy ) = $n < 0 ? ( -( abs($n) >> 16 ), abs($n) & 0xFFFF ) : ( $n >> 16 , $n & 0xFFFF ); } }); ## # sub _pack { # my ( $x, $y ) = @_; # # bits 0,1 indicate neg flag for $x,$y respectively # return # ( abs($x) << 18 ) + ( $x < 0 ? 1 : 0 ) + # ( abs($y) << 2 ) + ( $y < 0 ? 2 : 0 ); # } # # sub _unpack { # my ( $n ) = @_; # # bits 0,1 indicate neg flag for $x,$y respectively # return ( # $n & 0x1 ? -($n >> 18 ) : $n >> 18, # $n & 0x2 ? -($n >> 2 & 0xFFFF) : $n >> 2 & 0xFFFF # ); # } ## bench( "bitops neg ", sub { # map two integers $x and $y into $n # support negative $x and $y for ( 1 .. $iters ) { $n = ( abs($x) << 18 ) + ( $x < 0 ? 1 : 0 ) + ( abs($y) << 2 ) + ( $y < 0 ? 2 : 0 ); ( $xx, $yy ) = ( $n & 0x1 ? -($n >> 18 ) : $n >> 18, $n & 0x2 ? -($n >> 2 & 0xFFFF) : $n >> 2 & 0xFFFF ); } }); bench( "(un)pack ii", sub { for ( 1 .. $iters ) { $n = pack 'ii', $x, $y; ( $xx, $yy ) = unpack 'ii', $n; } }); bench( "(un)pack i2", sub { for ( 1 .. $iters ) { $n = pack 'i2', $x, $y; ( $xx, $yy ) = unpack 'i2', $n; } }); exit; sub bench { my ( $start, $desc, $fcn ) = ( scalar time, @_ ); $fcn->(); printf "duration $desc %0.03f\n", time - $start; }

Regards, Mario

Replies are listed 'Best First'.
Re^11: High Performance Game of Life (updated)
by marioroy (Prior) on Aug 14, 2017 at 14:39 UTC

    Update: Replaced bit-OR with addition to have $x and $y line up for better readability.

    This post is a fun study, comparing pack 'i2' against the mapping of two integers into one via bit manipulation. A use case for doing this is wanting readable keys for storing into a database using one field. This laptop runs an i7 Haswell at 2.6 GHz. Unfortunately, I do not have anything slower to run on.

    bin/perl v5.26.0

    $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs time taken: 58 secs <- 32-bit Windows VM $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 39 secs time taken: 64 secs <- 32-bit Windows VM, applied 32-bit tip below

    bin/cperl v5.24.3c

    $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 38 secs

    I used tybalt89's update and applied the mapping logic. All tests pass, thanks to new test script by eyepopslikeamosquito. Please ensure Perl is compiled with 64-bit support. 16 bits hold the value for $y and 2 bits for whether $x,$y are less than 0. $x is stored in the remaining bits. This results in minimum key lenght as $y isn't big. Though, adjust accordingly the number of bits to shift and bitmask if necessary.

    bits 63-18 contains the $x value bits 17-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

    On 32-bit hardware, replace 18 and 0xFFFF with 10 and 0xFF throughout the module.

    bits 31-10 contains the $x value bits 9-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

    Both _unpack and _pack are inlined inside tick for maximum performance.

    package Organism; use strict; # use warnings; sub _pack { my ( $x, $y ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( abs($x) << 18 ) + ( $x < 0 ? 1 : 0 ) + ( abs($y) << 2 ) + ( $y < 0 ? 2 : 0 ); } sub _unpack { my ( $n ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( $n & 0x1 ? -($n >> 18 ) : $n >> 18, $n & 0x2 ? -($n >> 2 & 0xFFFF) : $n >> 2 & 0xFFFF ); } sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ _pack @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ _unpack $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = ( $c & 0x1 ? -($c >> 18 ) : $c >> 18, $c & 0x2 ? -($c >> 2 & 0xFFFF) : $c >> 2 & 0xFFFF ); ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); $dead_cells{$_}++ for my @zcells = ( ( $k1 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k1}), ( $k2 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k2}), ( $k3 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k3}), ( $k4 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k4}), ( $k5 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k5}), ( $k6 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k6}), ( $k7 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k7}), ( $k8 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;

    Pack is faster, of course. However, mapping two integers into one is not far behind.

    Regards, Mario

      I don't have access to a 64-bit perl. However, this is a 32-bit version of combining two 16 bit integers into a 32 bit number. While this runs as long as the coordinates are within range, I can't test it in the 64-bit version.

      I like it because all offsets are found with a simple addition inside tick.
      All the encode/decode mess is in the input and output routines.

      In theory (completely untested), all that's needed is to change the line

      my $half = 16; # make 32 for 64-bit perls

      to

      my $half = 32; # make 32 for 64-bit perls

      to make it use the full range of two 32 bit numbers.

      So here's the code

      package Organism; use strict; use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $self = shift; my $cells = $self->{Cells}; for my $r (@_) { $cells->{ (($r->[0] + $self->{fudge}) << $self->{half}) | ($r->[1] + $self->{fudge}) } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { my $self = shift; sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ ($_ >> $self->{half}) - $self->{fudge}, ($_ & (1 << $self->{half}) - 1) - $self->{fudge} ] } keys %{ $self->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my @deltas = @{ $self->{deltas} }; my ( %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell $dead_cells{$_}++ for my @zcells = grep !exists $cells->{$_}, map $c + $_, @deltas; # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my $half = 16; # make 32 for 64-bit perls my $base = 1 << $half; my $fudge = $base >> 1; my @deltas = ($base-1, $base, $base+1, -1, 1, -$base-1, -$base, -$base+1); my %init_self = ( Cells => {}, fudge => $fudge, half => $half, deltas => \@deltas ); bless \%init_self, $class; } 1;

      Preliminary testing with 16 bit numbers seemed to show it's about 10% slower than the pack version :(

        Hi tybalt89. The following was captured from a 64-bit laptop with $half = 32. Unfortunately, cperl is failing for some reason. I also tried $half = 16, same thing. The cell count at end isn't correct. To be sure, I pulled down the latest maint release and tried again.

        $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

        bin/perl

        $ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs

        bin/cperl

        $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs $ /opt/cperl-5.26.1c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs

        Regarding cperl, I built it using the following configure options. The source for cperl-maint-5.24c and cperl-maint-5.26c can be found on Github.

        ./Configure -Dprefix=/opt/cperl-5.24.3c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2 ./Configure -Dprefix=/opt/cperl-5.26.1c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2

        Regards, Mario