http://www.perlmonks.org?node_id=1197363


in reply to Re^10: High Performance Game of Life
in thread High Performance Game of Life

Update: Replaced bit-OR with addition to have $x and $y line up for better readability.

This post is a fun study, comparing pack 'i2' against the mapping of two integers into one via bit manipulation. A use case for doing this is wanting readable keys for storing into a database using one field. This laptop runs an i7 Haswell at 2.6 GHz. Unfortunately, I do not have anything slower to run on.

bin/perl v5.26.0

$ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs time taken: 58 secs <- 32-bit Windows VM $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 39 secs time taken: 64 secs <- 32-bit Windows VM, applied 32-bit tip below

bin/cperl v5.24.3c

$ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 38 secs

I used tybalt89's update and applied the mapping logic. All tests pass, thanks to new test script by eyepopslikeamosquito. Please ensure Perl is compiled with 64-bit support. 16 bits hold the value for $y and 2 bits for whether $x,$y are less than 0. $x is stored in the remaining bits. This results in minimum key lenght as $y isn't big. Though, adjust accordingly the number of bits to shift and bitmask if necessary.

bits 63-18 contains the $x value bits 17-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

On 32-bit hardware, replace 18 and 0xFFFF with 10 and 0xFF throughout the module.

bits 31-10 contains the $x value bits 9-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

Both _unpack and _pack are inlined inside tick for maximum performance.

package Organism; use strict; # use warnings; sub _pack { my ( $x, $y ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( abs($x) << 18 ) + ( $x < 0 ? 1 : 0 ) + ( abs($y) << 2 ) + ( $y < 0 ? 2 : 0 ); } sub _unpack { my ( $n ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( $n & 0x1 ? -($n >> 18 ) : $n >> 18, $n & 0x2 ? -($n >> 2 & 0xFFFF) : $n >> 2 & 0xFFFF ); } sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ _pack @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ _unpack $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = ( $c & 0x1 ? -($c >> 18 ) : $c >> 18, $c & 0x2 ? -($c >> 2 & 0xFFFF) : $c >> 2 & 0xFFFF ); ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); $dead_cells{$_}++ for my @zcells = ( ( $k1 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k1}), ( $k2 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k2}), ( $k3 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k3}), ( $k4 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k4}), ( $k5 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k5}), ( $k6 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k6}), ( $k7 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k7}), ( $k8 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;

Pack is faster, of course. However, mapping two integers into one is not far behind.

Regards, Mario

Replies are listed 'Best First'.
Re^12: High Performance Game of Life (updated)
by tybalt89 (Monsignor) on Aug 15, 2017 at 00:13 UTC

    I don't have access to a 64-bit perl. However, this is a 32-bit version of combining two 16 bit integers into a 32 bit number. While this runs as long as the coordinates are within range, I can't test it in the 64-bit version.

    I like it because all offsets are found with a simple addition inside tick.
    All the encode/decode mess is in the input and output routines.

    In theory (completely untested), all that's needed is to change the line

    my $half = 16; # make 32 for 64-bit perls

    to

    my $half = 32; # make 32 for 64-bit perls

    to make it use the full range of two 32 bit numbers.

    So here's the code

    package Organism; use strict; use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $self = shift; my $cells = $self->{Cells}; for my $r (@_) { $cells->{ (($r->[0] + $self->{fudge}) << $self->{half}) | ($r->[1] + $self->{fudge}) } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { my $self = shift; sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ ($_ >> $self->{half}) - $self->{fudge}, ($_ & (1 << $self->{half}) - 1) - $self->{fudge} ] } keys %{ $self->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my @deltas = @{ $self->{deltas} }; my ( %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell $dead_cells{$_}++ for my @zcells = grep !exists $cells->{$_}, map $c + $_, @deltas; # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my $half = 16; # make 32 for 64-bit perls my $base = 1 << $half; my $fudge = $base >> 1; my @deltas = ($base-1, $base, $base+1, -1, 1, -$base-1, -$base, -$base+1); my %init_self = ( Cells => {}, fudge => $fudge, half => $half, deltas => \@deltas ); bless \%init_self, $class; } 1;

    Preliminary testing with 16 bit numbers seemed to show it's about 10% slower than the pack version :(

      Hi tybalt89. The following was captured from a 64-bit laptop with $half = 32. Unfortunately, cperl is failing for some reason. I also tried $half = 16, same thing. The cell count at end isn't correct. To be sure, I pulled down the latest maint release and tried again.

      $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

      bin/perl

      $ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs

      bin/cperl

      $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs $ /opt/cperl-5.26.1c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs

      Regarding cperl, I built it using the following configure options. The source for cperl-maint-5.24c and cperl-maint-5.26c can be found on Github.

      ./Configure -Dprefix=/opt/cperl-5.24.3c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2 ./Configure -Dprefix=/opt/cperl-5.26.1c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2

      Regards, Mario

        When testing with the 32 bit version, the range of coordinates should be restricted to what a 16 bit number can hold, from roughly -32000 to 32000 (to be safe :)

        The 32 bit version does pass the tests, and runs OK on my system with a much smaller createblinker.pl range.

        That -900000 is way out of range.

        As for problems with the 64 bit version, sorry I can't help :(