http://www.perlmonks.org?node_id=1199589

Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations - a fast slow program.

-- Michael Abrash

In High Performance Game of Life, I chose a very simple design, storing all live cells in a single set. Though pleasing for its simplicity and unboundedness, its drawback is that counting live neighbours becomes a hash lookup, a chronic performance bottleneck. What to do?

Rather than spending more time optimizing my original design -- thus creating a "fast slow program" -- I researched the domain, learning of many different ways to do it. From the many possible approaches, I chose the simplest one I could find that looked interesting and enjoyable, and implemented it in pure Perl.

To try to keep my initial attempt short and understandable, I started with a simplified version based on the the brilliant works of Adam P. Goucher (apg), tiling the universe with 64 x 64 tiles in a conventional way, each tile having eight neighbours. Note that this was chosen for simplicity; more efficient schemes are available, such as the "brick wall" tiling used by Goucher in later versions. For background on the concept of breaking the game of life universe into overlapping tiles, see this description of Life128 and vlife.

My code is loosely based on apgnano (version 2) but advances one tick at a time (rather than two at a time, as apg did) and does not attempt to use universe history. Fair warning though. Despite striving to keep the code simple and short, it's way more complex than my original, Organism.pm swelling from 66 lines of code to 414.

Benchmark Results

I've updated the benchmark results given in my original node. As you can see, even this simplified version, with no attempts made at code optimization, is already an order of magnitude faster than the optimized version of the original.

Version375K cells750K cells1.5 million cells3 million cells
new Organism.pm (see below)1 secs1 secs3 secs5 secs
Organism.pm (Mario improvements)13 secs26 secs52 secs108 secs
Organism.pm (Original)35 secs70 secs141 secs284 secs
Game::Life::Infinite:Board37 secs96 secs273 secs905 secs

As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
VersionLidka 30,000 ticks
new Organism.pm (see below)58 secs
Organism.pm (Mario improvements)450 secs
Organism.pm (Original)1635 secs
Game::Life::Infinite:Board640 secs

Improving My Initial Attempt

There is certainly plenty of scope for improving my initial attempt. After all, I have not attempted any optimizations at all, just tried to implement ideas from apg's C++/assembler programs in a pure Perl form in a simple and clear way. While all feedback is welcome, I'm especially eager to see:

As a minimum, any code refactorings should be tested by running tgol.t and tgol3.t from my original node. Note that this new version of Organism.pm is (or should be) 100% interface compatible with my original.

New Organism.pm

Finally, here is my new and improved Organism.pm (update: the latest and best Organism.pm can be found here):

package Organism; use strict; # Note: for this module, perl must be built with 64-bit integers # use Config; # $Config{ivsize} < 8 and die "perl ivsize=$Config{ivsize} is too smal +l"; # ---------------------------------------------------------------- # The Universe is modelled as a set of overlapping tiles. # For background, see http://conwaylife.com/wiki/Life128_and_vlife # We use a simple scheme of 64 x 64 tiles (60 x 60 core) with # conventional tiling (each tile has eight neighbours). # Note that this was chosen for simplicity; more efficient schemes # are available, such as the "brick wall" tiling used by Goucher # in later versions (apgmera, version 3) # # This code is loosely based on apgnano (version 2) but advances # one tick at a time (rather than advancing two at a time) # and does not attempt to use universe history. # This was to keep the implementation short. # # ---------------------------------------------------------------- # SQUARE TILE my $BORDER_WIDTH = 2; my $BORDER_WIDTH_P1 = $BORDER_WIDTH + 1; my $TILE_SIZE_FULL = 64; my $TILE_SIZE_FULL_M1 = $TILE_SIZE_FULL - 1; my $TILE_SIZE_FULL_MB = $TILE_SIZE_FULL - $BORDER_WIDTH; my $TILE_SIZE_CORE = $TILE_SIZE_FULL - 2 * $BORDER_WIDTH; my $TILE_SIZE_CORE_P1 = $TILE_SIZE_CORE + 1; my $MIDDLE60 = 0x3ffffffffffffffc; my $LEFT62 = 0xfffffffffffffffc; my $RIGHT62 = 0x3fffffffffffffff; my $OUTER4 = 0xc000000000000003; my $LEFTMIDDLE = 0x3000000000000000; my $RIGHTMIDDLE = 0x000000000000000c; # Neighbours are numbered clockwise starting with the one directly abo +ve my $NUM_NEIGH = 8; my $NEIGH_TOP = 0; my $NEIGH_TOP_RIGHT = 1; my $NEIGH_RIGHT = 2; my $NEIGH_BOTTOM_RIGHT = 3; my $NEIGH_BOTTOM = 4; my $NEIGH_BOTTOM_LEFT = 5; my $NEIGH_LEFT = 6; my $NEIGH_TOP_LEFT = 7; # Note that the i ^ 4 trick sets i to the opposite one, that is: # 0 > 4 (TOP > BOTTOM) # 1 > 5 (TOP RIGHT > BOTTOM LEFT) # 2 > 6 (RIGHT > LEFT) # 3 > 7 (BOTTOM RIGHT > TOP LEFT) # 4 > 0 (BOTTOM > TOP) # 5 > 1 (BOTTOM LEFT > TOP RIGHT) # 6 > 2 (LEFT > RIGHT) # 7 > 3 (TOP LEFT > BOTTOM RIGHT) # The functions starting with st64_ manipulate # a simple 64 x 64 square tile bitmap. # Note that x and y must be in 0..63 range. # $row is a ref to an array of 64 unsigned 64-bit ints. # The value in row[] bitmap is 0 (dead) or 1 (alive). sub st64_getcellval { my ($row, $x, $y) = @_; my $mk = 1 << (63 - $x); return $row->[$y] & $mk ? 1 : 0; } sub st64_setcellval { my ($row, $x, $y, $v) = @_; my $mk = 1 << (63 - $x); if ($v) { $row->[$y] |= $mk; } else { $row->[$y] &= ~$mk; } } sub st64_insertcells { my $row = shift; for my $r (@_) { st64_setcellval($row, $r->[0], $r->[1], 1) } } # Used for verification and unit testing of st64_tiletick sub st64_getlivecells { my $row = shift; my @cells; for my $y (0 .. 63) { next unless $row->[$y]; for my $x (0 .. 63) { st64_getcellval($row, $x, $y) and push @cells, [ $x, $y ]; } } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells; } # Advance the interior of square tile by one tick. # Return a two element list: # [0] : 1 if square tile changed, else 0. # [1] : neighbour flags (see NEIGH flags above) # indicates which neighbours need to be updated sub st64_tiletick { my $row = shift; my $neigh = 0; my $bigdiff = 0; my @carry = (0) x 64; my @parity = (0) x 64; my @diff = (0) x 64; my ( $aa, $bb, $p, $q, $r, $s, $bit0, $bit1, $bit2 ); my $top = 0; my $bottom = $TILE_SIZE_FULL_M1; while ($top < $TILE_SIZE_FULL_M1 && $row->[$top] == 0) { ++$top } while ($bottom > 0 && $row->[$bottom] == 0) { --$bottom } if ($top > $bottom) { return ( 0, $neigh ) } for my $i ($top .. $bottom) { $aa = $row->[$i] >> 1; $bb = $row->[$i] << 1; $q = $aa ^ $bb; $parity[$i] = $q ^ $row->[$i]; $carry[$i] = ($q & $row->[$i]) | ($aa & $bb); } --$top; ++$bottom; if ($top < 1) { $top = 1 } if ($bottom > $TILE_SIZE_FULL_MB) { $bottom = $TILE_SIZE_FULL_MB } for my $i ($top .. $bottom) { $aa = $parity[$i-1]; $bb = $parity[$i+1]; $q = $aa ^ $bb; $bit0 = $q ^ $parity[$i]; $r = ($q & $parity[$i]) | ($aa & $bb); $aa = $carry[$i-1]; $bb = $carry[$i+1]; $q = $aa ^ $bb; $p = $q ^ $carry[$i]; $s = ($q & $carry[$i]) | ($aa & $bb); $bit1 = $p ^ $r; $bit2 = $s ^ ($p & $r); $p = ($bit0 & $bit1 & ~$bit2) | ($bit2 & ~$bit1 & ~$bit0 & $row- +>[$i]); $diff[$i] = ($row->[$i] ^ $p) & $MIDDLE60; $bigdiff |= $diff[$i]; $row->[$i] = ($p & $MIDDLE60) | ($row->[$i] & ~$MIDDLE60); } $aa = $diff[$BORDER_WIDTH] | $diff[$BORDER_WIDTH_P1]; $bb = $diff[$TILE_SIZE_CORE] | $diff[$TILE_SIZE_CORE_P1]; if ($bigdiff) { if ($bigdiff & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_LEFT } if ($bigdiff & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_RIGHT } } if ($aa) { $neigh |= 1 << $NEIGH_TOP; if ($aa & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_LEFT } if ($aa & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_RIGHT } } if ($bb) { $neigh |= 1 << $NEIGH_BOTTOM; if ($bb & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_LEFT } if ($bb & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_RIGHT } } my $changed = ($bigdiff != 0) ? 1 : 0; return ( $changed, $neigh ); } # Population count (https://en.wikipedia.org/wiki/Hamming_weight) # See also GCC built-in: __builtin_popcount sub popcount { my $x = shift; my $count; for ($count = 0; $x; ++$count) { $x &= $x - 1 } return $count; } # ---------------------------------------------------------------- # ORGANISM sub count { my $self = shift; my $tiles = $self->{Tiles}; my $cnt = 0; for my $k (keys %{$tiles}) { my $row = $tiles->{$k}->{Row}; for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { next unless $row->[$y]; $cnt += popcount($row->[$y] & $MIDDLE60); } } return $cnt; } # Input a list of [ x, y ] coords sub insert_cells { my $self = shift; for my $r (@_) { $self->setcell($r->[0], $r->[1], 1) } } # Used for verification and testing the state of the organism sub get_live_cells { my $self = shift; my $tiles = $self->{Tiles}; my @cells; for my $k (keys %{$tiles}) { my $sqt = $tiles->{$k}; for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { next unless $sqt->{Row}->[$y]; for my $x ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { if (st64_getcellval($sqt->{Row}, $x, $y)) { push @cells, [$TILE_SIZE_CORE * $sqt->{Tx} + $x - $BORDER_WIDTH, $TILE_SIZE_CORE * $sqt->{Ty} + $y - $BORDER_WIDTH]; } } } } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells; } sub get_neighbour { my $self = shift; my $sqt = shift; my $i = shift; unless ($sqt->{Neighbours}->[$i]) { my $x = $sqt->{Tx}; my $y = $sqt->{Ty}; if ($i >= $NEIGH_TOP_RIGHT && $i <= $NEIGH_BOTTOM_RIGHT) { ++ +$x } if ($i >= $NEIGH_BOTTOM_RIGHT && $i <= $NEIGH_BOTTOM_LEFT) { ++ +$y } if ($i >= $NEIGH_BOTTOM_LEFT && $i <= $NEIGH_TOP_LEFT) { -- +$x } if ($i == $NEIGH_TOP_LEFT || $i <= $NEIGH_TOP_RIGHT) { -- +$y } my $tiles = $self->{Tiles}; my $k = pack 'i2', $x, $y; unless (exists $tiles->{$k}) { $tiles->{$k} = { Row => [ (0) x 64 ], Tx => $x, Ty => $y, Updateflags => 0, Neighbours => [], }; } $sqt->{Neighbours}->[$i] = $tiles->{$k}; $sqt->{Neighbours}->[$i]->{Tx} = $x; $sqt->{Neighbours}->[$i]->{Ty} = $y; } return $sqt->{Neighbours}->[$i]; } # Alert the neighbour that its neighbour (the original tile) has chang +ed sub update_neighbour { my $self = shift; my $sqt = shift; my $i = shift; if ($self->get_neighbour($sqt, $i)->{Updateflags} == 0) { push @{$self->{Modified}}, $self->get_neighbour($sqt, $i); } $self->get_neighbour($sqt, $i)->{Updateflags} |= 1 << ($i ^ 4); } # Update the relevant portions of the boundary (a 64-by-64 square # with the central 60-by-60 square removed) by copying data from # the interiors (the 60-by-60 central squares) of the neighbours. # Only perform this copying when necessary. sub update_boundary { my $self = shift; my $sqt = shift; my $temp_modified = $self->{TempModified}; if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP); $sqt->{Row}->[0] = ($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) | +($sqt->{Row}->[0] & $OUTER4); $sqt->{Row}->[1] = ($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60) + | ($sqt->{Row}->[1] & $OUTER4); } if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP_LEFT); $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) < +< $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $RIGHT62); $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60 +) << $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $RIGHT62); } if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP_RIGHT); $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) > +> $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $LEFT62); $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60 +) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $LEFT62); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = ($n->{Row}->[$BORDER_WIDTH] +& $MIDDLE60) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB] & $OUTER4); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = ($n->{Row}->[3] & $MIDDLE60) + | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $OUTER4); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_LEFT); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH] + & $MIDDLE60) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB +] & $RIGHT62); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60 +) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $RIGHT62 +); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_RIGHT); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH] + & $MIDDLE60) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB +] & $LEFT62); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60 +) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $LEFT62) +; } if ( $sqt->{Updateflags} & (1 << $NEIGH_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_LEFT); for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) { $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) << $TILE_S +IZE_CORE) | ($sqt->{Row}->[$i] & $RIGHT62); } } if ( $sqt->{Updateflags} & (1 << $NEIGH_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_RIGHT); for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) { $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) >> $TILE_S +IZE_CORE) | ($sqt->{Row}->[$i] & $LEFT62); } } $sqt->{Updateflags} = 0; push @{$temp_modified}, $sqt; } # Advance the interior of the tile by one generation. sub update_tile { my $self = shift; my $modified = $self->{Modified}; my $sqt = shift; my ($update_flag, $neigh) = st64_tiletick($sqt->{Row}); if ($update_flag) { if ($sqt->{Updateflags} == 0) { push @{$modified}, $sqt } $sqt->{Updateflags} |= 1 << $NUM_NEIGH; } for my $i (0 .. $NUM_NEIGH - 1) { if ($neigh & (1 << $i)) { $self->update_neighbour($sqt, $i) } } } sub tick { my $self = shift; my $modified = $self->{Modified}; my $temp_modified = $self->{TempModified}; while (@{$modified}) { $self->update_boundary(pop @{$modified}); } while (@{$temp_modified}) { $self->update_tile(pop @{$temp_modified}); } } sub updatecell { my $self = shift; my $sqt = shift; my $x = shift; my $y = shift; if ($sqt->{Updateflags} == 0) { push @{$self->{Modified}}, $sqt } $sqt->{Updateflags} |= 1 << $NUM_NEIGH; if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_ +TOP) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_B +OTTOM) } if ($x <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_LEFT); if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI +GH_TOP_LEFT) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG +H_BOTTOM_LEFT) } } if ($x >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_RIGHT); if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI +GH_TOP_RIGHT) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG +H_BOTTOM_RIGHT) } } } sub setcell { my $self = shift; my $x = shift; my $y = shift; my $state = shift; my $tiles = $self->{Tiles}; my $ox = $x % $TILE_SIZE_CORE; my $oy = $y % $TILE_SIZE_CORE; if ($ox < 0) { $ox += $TILE_SIZE_CORE } if ($oy < 0) { $oy += $TILE_SIZE_CORE } my $tx = ($x - $ox) / $TILE_SIZE_CORE; my $ty = ($y - $oy) / $TILE_SIZE_CORE; my $k = pack 'i2', $tx, $ty; unless (exists $tiles->{$k}) { $tiles->{$k} = { Row => [ (0) x 64 ], Tx => $tx, Ty => $ty, Updateflags => 0, Neighbours => [], }; } my $xx = $ox + $BORDER_WIDTH; my $yy = $oy + $BORDER_WIDTH; st64_setcellval($tiles->{$k}->{Row}, $xx, $yy, $state); $self->updatecell($tiles->{$k}, $xx, $yy); } sub getcellval { my $self = shift; my $x = shift; my $y = shift; my $tiles = $self->{Tiles}; my $ox = $x % $TILE_SIZE_CORE; my $oy = $y % $TILE_SIZE_CORE; if ($ox < 0) { $ox += $TILE_SIZE_CORE } if ($oy < 0) { $oy += $TILE_SIZE_CORE } my $tx = ($x - $ox) / $TILE_SIZE_CORE; my $ty = ($y - $oy) / $TILE_SIZE_CORE; my $k = pack 'i2', $tx, $ty; exists $tiles->{$k} or return 0; return st64_getcellval( $tiles->{$k}->{Row}, $ox + $BORDER_WIDTH, $oy + $BORDER_WIDTH ); } sub new { my $class = shift; my %init_self = ( Tiles => {}, Modified => [], TempModified => [] ) +; bless \%init_self, $class; } 1;

Note that this new implementation passes all the same tests (tgol.t, tgol2.t, tgol3.t) described in High Performance Game of Life.

References

Updated: Added more references.