Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^11: High Performance Game of Life (updated)

by marioroy (Prior)
on Aug 14, 2017 at 14:39 UTC ( [id://1197363]=note: print w/replies, xml ) Need Help??


in reply to Re^10: High Performance Game of Life
in thread High Performance Game of Life

Update: Replaced bit-OR with addition to have $x and $y line up for better readability.

This post is a fun study, comparing pack 'i2' against the mapping of two integers into one via bit manipulation. A use case for doing this is wanting readable keys for storing into a database using one field. This laptop runs an i7 Haswell at 2.6 GHz. Unfortunately, I do not have anything slower to run on.

bin/perl v5.26.0

$ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs time taken: 58 secs <- 32-bit Windows VM $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 39 secs time taken: 64 secs <- 32-bit Windows VM, applied 32-bit tip below

bin/cperl v5.24.3c

$ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # pack i2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 37 secs $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 # mapping cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 38 secs

I used tybalt89's update and applied the mapping logic. All tests pass, thanks to new test script by eyepopslikeamosquito. Please ensure Perl is compiled with 64-bit support. 16 bits hold the value for $y and 2 bits for whether $x,$y are less than 0. $x is stored in the remaining bits. This results in minimum key lenght as $y isn't big. Though, adjust accordingly the number of bits to shift and bitmask if necessary.

bits 63-18 contains the $x value bits 17-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

On 32-bit hardware, replace 18 and 0xFFFF with 10 and 0xFF throughout the module.

bits 31-10 contains the $x value bits 9-2 contains the $y value bit 1 set when $y < 0 bit 0 set when $x < 0

Both _unpack and _pack are inlined inside tick for maximum performance.

package Organism; use strict; # use warnings; sub _pack { my ( $x, $y ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( abs($x) << 18 ) + ( $x < 0 ? 1 : 0 ) + ( abs($y) << 2 ) + ( $y < 0 ? 2 : 0 ); } sub _unpack { my ( $n ) = @_; # bits 0,1 negative flag for $x,$y respectively return ( $n & 0x1 ? -($n >> 18 ) : $n >> 18, $n & 0x2 ? -($n >> 2 & 0xFFFF) : $n >> 2 & 0xFFFF ); } sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ _pack @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ _unpack $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = ( $c & 0x1 ? -($c >> 18 ) : $c >> 18, $c & 0x2 ? -($c >> 2 & 0xFFFF) : $c >> 2 & 0xFFFF ); ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); $dead_cells{$_}++ for my @zcells = ( ( $k1 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k1}), ( $k2 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k2}), ( $k3 = ( abs($x1) << 18 ) + ( $x1 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k3}), ( $k4 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k4}), ( $k5 = ( abs($x0) << 18 ) + ( $x0 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k5}), ( $k6 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y1) << 2 ) + ( $y1 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k6}), ( $k7 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y0) << 2 ) + ( $y0 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k7}), ( $k8 = ( abs($x2) << 18 ) + ( $x2 < 0 ? 1 : 0 ) + ( abs($y2) << 2 ) + ( $y2 < 0 ? 2 : 0 ) ) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;

Pack is faster, of course. However, mapping two integers into one is not far behind.

Regards, Mario

Replies are listed 'Best First'.
Re^12: High Performance Game of Life (updated)
by tybalt89 (Monsignor) on Aug 15, 2017 at 00:13 UTC

    I don't have access to a 64-bit perl. However, this is a 32-bit version of combining two 16 bit integers into a 32 bit number. While this runs as long as the coordinates are within range, I can't test it in the 64-bit version.

    I like it because all offsets are found with a simple addition inside tick.
    All the encode/decode mess is in the input and output routines.

    In theory (completely untested), all that's needed is to change the line

    my $half = 16; # make 32 for 64-bit perls

    to

    my $half = 32; # make 32 for 64-bit perls

    to make it use the full range of two 32 bit numbers.

    So here's the code

    package Organism; use strict; use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $self = shift; my $cells = $self->{Cells}; for my $r (@_) { $cells->{ (($r->[0] + $self->{fudge}) << $self->{half}) | ($r->[1] + $self->{fudge}) } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { my $self = shift; sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ ($_ >> $self->{half}) - $self->{fudge}, ($_ & (1 << $self->{half}) - 1) - $self->{fudge} ] } keys %{ $self->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my @deltas = @{ $self->{deltas} }; my ( %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell $dead_cells{$_}++ for my @zcells = grep !exists $cells->{$_}, map $c + $_, @deltas; # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my $half = 16; # make 32 for 64-bit perls my $base = 1 << $half; my $fudge = $base >> 1; my @deltas = ($base-1, $base, $base+1, -1, 1, -$base-1, -$base, -$base+1); my %init_self = ( Cells => {}, fudge => $fudge, half => $half, deltas => \@deltas ); bless \%init_self, $class; } 1;

    Preliminary testing with 16 bit numbers seemed to show it's about 10% slower than the pack version :(

      Hi tybalt89. The following was captured from a 64-bit laptop with $half = 32. Unfortunately, cperl is failing for some reason. I also tried $half = 16, same thing. The cell count at end isn't correct. To be sure, I pulled down the latest maint release and tried again.

      $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

      bin/perl

      $ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs

      bin/cperl

      $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs $ /opt/cperl-5.26.1c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs

      Regarding cperl, I built it using the following configure options. The source for cperl-maint-5.24c and cperl-maint-5.26c can be found on Github.

      ./Configure -Dprefix=/opt/cperl-5.24.3c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2 ./Configure -Dprefix=/opt/cperl-5.26.1c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2

      Regards, Mario

        When testing with the 32 bit version, the range of coordinates should be restricted to what a 16 bit number can hold, from roughly -32000 to 32000 (to be safe :)

        The 32 bit version does pass the tests, and runs OK on my system with a much smaller createblinker.pl range.

        That -900000 is way out of range.

        As for problems with the 64 bit version, sorry I can't help :(

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1197363]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2025-07-15 04:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.