http://www.perlmonks.org?node_id=1199589

Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations - a fast slow program.

-- Michael Abrash

In High Performance Game of Life, I chose a very simple design, storing all live cells in a single set. Though pleasing for its simplicity and unboundedness, its drawback is that counting live neighbours becomes a hash lookup, a chronic performance bottleneck. What to do?

Rather than spending more time optimizing my original design -- thus creating a "fast slow program" -- I researched the domain, learning of many different ways to do it. From the many possible approaches, I chose the simplest one I could find that looked interesting and enjoyable, and implemented it in pure Perl.

To try to keep my initial attempt short and understandable, I started with a simplified version based on the the brilliant works of Adam P. Goucher (apg), tiling the universe with 64 x 64 tiles in a conventional way, each tile having eight neighbours. Note that this was chosen for simplicity; more efficient schemes are available, such as the "brick wall" tiling used by Goucher in later versions. For background on the concept of breaking the game of life universe into overlapping tiles, see this description of Life128 and vlife.

My code is loosely based on apgnano (version 2) but advances one tick at a time (rather than two at a time, as apg did) and does not attempt to use universe history. Fair warning though. Despite striving to keep the code simple and short, it's way more complex than my original, Organism.pm swelling from 66 lines of code to 414.

Benchmark Results

I've updated the benchmark results given in my original node. As you can see, even this simplified version, with no attempts made at code optimization, is already an order of magnitude faster than the optimized version of the original.

Version375K cells750K cells1.5 million cells3 million cells
new Organism.pm (see below)1 secs1 secs3 secs5 secs
Organism.pm (Mario improvements)13 secs26 secs52 secs108 secs
Organism.pm (Original)35 secs70 secs141 secs284 secs
Game::Life::Infinite:Board37 secs96 secs273 secs905 secs

As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
VersionLidka 30,000 ticks
new Organism.pm (see below)58 secs
Organism.pm (Mario improvements)450 secs
Organism.pm (Original)1635 secs
Game::Life::Infinite:Board640 secs

Update: The latest and best benchmark timings can be found in Re^2: More Betterer Game of Life.

Improving My Initial Attempt

There is certainly plenty of scope for improving my initial attempt. After all, I have not attempted any optimizations at all, just tried to implement ideas from apg's C++/assembler programs in a pure Perl form in a simple and clear way. While all feedback is welcome, I'm especially eager to see:

As a minimum, any code refactorings should be tested by running tgol.t and tgol3.t from my original node. Note that this new version of Organism.pm is (or should be) 100% interface compatible with my original.

New Organism.pm

Finally, here is my new and improved Organism.pm (update: the latest and best Organism.pm can be found here):

package Organism; use strict; # Note: for this module, perl must be built with 64-bit integers # use Config; # $Config{ivsize} < 8 and die "perl ivsize=$Config{ivsize} is too smal +l"; # ---------------------------------------------------------------- # The Universe is modelled as a set of overlapping tiles. # For background, see http://conwaylife.com/wiki/Life128_and_vlife # We use a simple scheme of 64 x 64 tiles (60 x 60 core) with # conventional tiling (each tile has eight neighbours). # Note that this was chosen for simplicity; more efficient schemes # are available, such as the "brick wall" tiling used by Goucher # in later versions (apgmera, version 3) # # This code is loosely based on apgnano (version 2) but advances # one tick at a time (rather than advancing two at a time) # and does not attempt to use universe history. # This was to keep the implementation short. # # ---------------------------------------------------------------- # SQUARE TILE my $BORDER_WIDTH = 2; my $BORDER_WIDTH_P1 = $BORDER_WIDTH + 1; my $TILE_SIZE_FULL = 64; my $TILE_SIZE_FULL_M1 = $TILE_SIZE_FULL - 1; my $TILE_SIZE_FULL_MB = $TILE_SIZE_FULL - $BORDER_WIDTH; my $TILE_SIZE_CORE = $TILE_SIZE_FULL - 2 * $BORDER_WIDTH; my $TILE_SIZE_CORE_P1 = $TILE_SIZE_CORE + 1; my $MIDDLE60 = 0x3ffffffffffffffc; my $LEFT62 = 0xfffffffffffffffc; my $RIGHT62 = 0x3fffffffffffffff; my $OUTER4 = 0xc000000000000003; my $LEFTMIDDLE = 0x3000000000000000; my $RIGHTMIDDLE = 0x000000000000000c; # Neighbours are numbered clockwise starting with the one directly abo +ve my $NUM_NEIGH = 8; my $NEIGH_TOP = 0; my $NEIGH_TOP_RIGHT = 1; my $NEIGH_RIGHT = 2; my $NEIGH_BOTTOM_RIGHT = 3; my $NEIGH_BOTTOM = 4; my $NEIGH_BOTTOM_LEFT = 5; my $NEIGH_LEFT = 6; my $NEIGH_TOP_LEFT = 7; # Note that the i ^ 4 trick sets i to the opposite one, that is: # 0 > 4 (TOP > BOTTOM) # 1 > 5 (TOP RIGHT > BOTTOM LEFT) # 2 > 6 (RIGHT > LEFT) # 3 > 7 (BOTTOM RIGHT > TOP LEFT) # 4 > 0 (BOTTOM > TOP) # 5 > 1 (BOTTOM LEFT > TOP RIGHT) # 6 > 2 (LEFT > RIGHT) # 7 > 3 (TOP LEFT > BOTTOM RIGHT) # The functions starting with st64_ manipulate # a simple 64 x 64 square tile bitmap. # Note that x and y must be in 0..63 range. # $row is a ref to an array of 64 unsigned 64-bit ints. # The value in row[] bitmap is 0 (dead) or 1 (alive). sub st64_getcellval { my ($row, $x, $y) = @_; my $mk = 1 << (63 - $x); return $row->[$y] & $mk ? 1 : 0; } sub st64_setcellval { my ($row, $x, $y, $v) = @_; my $mk = 1 << (63 - $x); if ($v) { $row->[$y] |= $mk; } else { $row->[$y] &= ~$mk; } } sub st64_insertcells { my $row = shift; for my $r (@_) { st64_setcellval($row, $r->[0], $r->[1], 1) } } # Used for verification and unit testing of st64_tiletick sub st64_getlivecells { my $row = shift; my @cells; for my $y (0 .. 63) { next unless $row->[$y]; for my $x (0 .. 63) { st64_getcellval($row, $x, $y) and push @cells, [ $x, $y ]; } } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells; } # Advance the interior of square tile by one tick. # Return a two element list: # [0] : 1 if square tile changed, else 0. # [1] : neighbour flags (see NEIGH flags above) # indicates which neighbours need to be updated sub st64_tiletick { my $row = shift; my $neigh = 0; my $bigdiff = 0; my @carry = (0) x 64; my @parity = (0) x 64; my @diff = (0) x 64; my ( $aa, $bb, $p, $q, $r, $s, $bit0, $bit1, $bit2 ); my $top = 0; my $bottom = $TILE_SIZE_FULL_M1; while ($top < $TILE_SIZE_FULL_M1 && $row->[$top] == 0) { ++$top } while ($bottom > 0 && $row->[$bottom] == 0) { --$bottom } if ($top > $bottom) { return ( 0, $neigh ) } for my $i ($top .. $bottom) { $aa = $row->[$i] >> 1; $bb = $row->[$i] << 1; $q = $aa ^ $bb; $parity[$i] = $q ^ $row->[$i]; $carry[$i] = ($q & $row->[$i]) | ($aa & $bb); } --$top; ++$bottom; if ($top < 1) { $top = 1 } if ($bottom > $TILE_SIZE_FULL_MB) { $bottom = $TILE_SIZE_FULL_MB } for my $i ($top .. $bottom) { $aa = $parity[$i-1]; $bb = $parity[$i+1]; $q = $aa ^ $bb; $bit0 = $q ^ $parity[$i]; $r = ($q & $parity[$i]) | ($aa & $bb); $aa = $carry[$i-1]; $bb = $carry[$i+1]; $q = $aa ^ $bb; $p = $q ^ $carry[$i]; $s = ($q & $carry[$i]) | ($aa & $bb); $bit1 = $p ^ $r; $bit2 = $s ^ ($p & $r); $p = ($bit0 & $bit1 & ~$bit2) | ($bit2 & ~$bit1 & ~$bit0 & $row- +>[$i]); $diff[$i] = ($row->[$i] ^ $p) & $MIDDLE60; $bigdiff |= $diff[$i]; $row->[$i] = ($p & $MIDDLE60) | ($row->[$i] & ~$MIDDLE60); } $aa = $diff[$BORDER_WIDTH] | $diff[$BORDER_WIDTH_P1]; $bb = $diff[$TILE_SIZE_CORE] | $diff[$TILE_SIZE_CORE_P1]; if ($bigdiff) { if ($bigdiff & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_LEFT } if ($bigdiff & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_RIGHT } } if ($aa) { $neigh |= 1 << $NEIGH_TOP; if ($aa & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_LEFT } if ($aa & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_RIGHT } } if ($bb) { $neigh |= 1 << $NEIGH_BOTTOM; if ($bb & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_LEFT } if ($bb & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_RIGHT } } my $changed = ($bigdiff != 0) ? 1 : 0; return ( $changed, $neigh ); } # Population count (https://en.wikipedia.org/wiki/Hamming_weight) # See also GCC built-in: __builtin_popcount sub popcount { my $x = shift; my $count; for ($count = 0; $x; ++$count) { $x &= $x - 1 } return $count; } # ---------------------------------------------------------------- # ORGANISM sub count { my $self = shift; my $tiles = $self->{Tiles}; my $cnt = 0; for my $k (keys %{$tiles}) { my $row = $tiles->{$k}->{Row}; for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { next unless $row->[$y]; $cnt += popcount($row->[$y] & $MIDDLE60); } } return $cnt; } # Input a list of [ x, y ] coords sub insert_cells { my $self = shift; for my $r (@_) { $self->setcell($r->[0], $r->[1], 1) } } # Used for verification and testing the state of the organism sub get_live_cells { my $self = shift; my $tiles = $self->{Tiles}; my @cells; for my $k (keys %{$tiles}) { my $sqt = $tiles->{$k}; for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { next unless $sqt->{Row}->[$y]; for my $x ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) { if (st64_getcellval($sqt->{Row}, $x, $y)) { push @cells, [$TILE_SIZE_CORE * $sqt->{Tx} + $x - $BORDER_WIDTH, $TILE_SIZE_CORE * $sqt->{Ty} + $y - $BORDER_WIDTH]; } } } } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells; } sub get_neighbour { my $self = shift; my $sqt = shift; my $i = shift; unless ($sqt->{Neighbours}->[$i]) { my $x = $sqt->{Tx}; my $y = $sqt->{Ty}; if ($i >= $NEIGH_TOP_RIGHT && $i <= $NEIGH_BOTTOM_RIGHT) { ++ +$x } if ($i >= $NEIGH_BOTTOM_RIGHT && $i <= $NEIGH_BOTTOM_LEFT) { ++ +$y } if ($i >= $NEIGH_BOTTOM_LEFT && $i <= $NEIGH_TOP_LEFT) { -- +$x } if ($i == $NEIGH_TOP_LEFT || $i <= $NEIGH_TOP_RIGHT) { -- +$y } my $tiles = $self->{Tiles}; my $k = pack 'i2', $x, $y; unless (exists $tiles->{$k}) { $tiles->{$k} = { Row => [ (0) x 64 ], Tx => $x, Ty => $y, Updateflags => 0, Neighbours => [], }; } $sqt->{Neighbours}->[$i] = $tiles->{$k}; $sqt->{Neighbours}->[$i]->{Tx} = $x; $sqt->{Neighbours}->[$i]->{Ty} = $y; } return $sqt->{Neighbours}->[$i]; } # Alert the neighbour that its neighbour (the original tile) has chang +ed sub update_neighbour { my $self = shift; my $sqt = shift; my $i = shift; if ($self->get_neighbour($sqt, $i)->{Updateflags} == 0) { push @{$self->{Modified}}, $self->get_neighbour($sqt, $i); } $self->get_neighbour($sqt, $i)->{Updateflags} |= 1 << ($i ^ 4); } # Update the relevant portions of the boundary (a 64-by-64 square # with the central 60-by-60 square removed) by copying data from # the interiors (the 60-by-60 central squares) of the neighbours. # Only perform this copying when necessary. sub update_boundary { my $self = shift; my $sqt = shift; my $temp_modified = $self->{TempModified}; if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP); $sqt->{Row}->[0] = ($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) | +($sqt->{Row}->[0] & $OUTER4); $sqt->{Row}->[1] = ($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60) + | ($sqt->{Row}->[1] & $OUTER4); } if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP_LEFT); $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) < +< $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $RIGHT62); $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60 +) << $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $RIGHT62); } if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_TOP_RIGHT); $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) > +> $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $LEFT62); $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60 +) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $LEFT62); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = ($n->{Row}->[$BORDER_WIDTH] +& $MIDDLE60) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB] & $OUTER4); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = ($n->{Row}->[3] & $MIDDLE60) + | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $OUTER4); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_LEFT); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH] + & $MIDDLE60) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB +] & $RIGHT62); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60 +) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $RIGHT62 +); } if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_RIGHT); $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH] + & $MIDDLE60) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB +] & $LEFT62); $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60 +) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $LEFT62) +; } if ( $sqt->{Updateflags} & (1 << $NEIGH_LEFT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_LEFT); for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) { $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) << $TILE_S +IZE_CORE) | ($sqt->{Row}->[$i] & $RIGHT62); } } if ( $sqt->{Updateflags} & (1 << $NEIGH_RIGHT) ) { my $n = $self->get_neighbour($sqt, $NEIGH_RIGHT); for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) { $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) >> $TILE_S +IZE_CORE) | ($sqt->{Row}->[$i] & $LEFT62); } } $sqt->{Updateflags} = 0; push @{$temp_modified}, $sqt; } # Advance the interior of the tile by one generation. sub update_tile { my $self = shift; my $modified = $self->{Modified}; my $sqt = shift; my ($update_flag, $neigh) = st64_tiletick($sqt->{Row}); if ($update_flag) { if ($sqt->{Updateflags} == 0) { push @{$modified}, $sqt } $sqt->{Updateflags} |= 1 << $NUM_NEIGH; } for my $i (0 .. $NUM_NEIGH - 1) { if ($neigh & (1 << $i)) { $self->update_neighbour($sqt, $i) } } } sub tick { my $self = shift; my $modified = $self->{Modified}; my $temp_modified = $self->{TempModified}; while (@{$modified}) { $self->update_boundary(pop @{$modified}); } while (@{$temp_modified}) { $self->update_tile(pop @{$temp_modified}); } } sub updatecell { my $self = shift; my $sqt = shift; my $x = shift; my $y = shift; if ($sqt->{Updateflags} == 0) { push @{$self->{Modified}}, $sqt } $sqt->{Updateflags} |= 1 << $NUM_NEIGH; if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_ +TOP) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_B +OTTOM) } if ($x <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_LEFT); if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI +GH_TOP_LEFT) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG +H_BOTTOM_LEFT) } } if ($x >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_RIGHT); if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI +GH_TOP_RIGHT) } if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG +H_BOTTOM_RIGHT) } } } sub setcell { my $self = shift; my $x = shift; my $y = shift; my $state = shift; my $tiles = $self->{Tiles}; my $ox = $x % $TILE_SIZE_CORE; my $oy = $y % $TILE_SIZE_CORE; if ($ox < 0) { $ox += $TILE_SIZE_CORE } if ($oy < 0) { $oy += $TILE_SIZE_CORE } my $tx = ($x - $ox) / $TILE_SIZE_CORE; my $ty = ($y - $oy) / $TILE_SIZE_CORE; my $k = pack 'i2', $tx, $ty; unless (exists $tiles->{$k}) { $tiles->{$k} = { Row => [ (0) x 64 ], Tx => $tx, Ty => $ty, Updateflags => 0, Neighbours => [], }; } my $xx = $ox + $BORDER_WIDTH; my $yy = $oy + $BORDER_WIDTH; st64_setcellval($tiles->{$k}->{Row}, $xx, $yy, $state); $self->updatecell($tiles->{$k}, $xx, $yy); } sub getcellval { my $self = shift; my $x = shift; my $y = shift; my $tiles = $self->{Tiles}; my $ox = $x % $TILE_SIZE_CORE; my $oy = $y % $TILE_SIZE_CORE; if ($ox < 0) { $ox += $TILE_SIZE_CORE } if ($oy < 0) { $oy += $TILE_SIZE_CORE } my $tx = ($x - $ox) / $TILE_SIZE_CORE; my $ty = ($y - $oy) / $TILE_SIZE_CORE; my $k = pack 'i2', $tx, $ty; exists $tiles->{$k} or return 0; return st64_getcellval( $tiles->{$k}->{Row}, $ox + $BORDER_WIDTH, $oy + $BORDER_WIDTH ); } sub new { my $class = shift; my %init_self = ( Tiles => {}, Modified => [], TempModified => [] ) +; bless \%init_self, $class; } 1;

Note that this new implementation passes all the same tests (tgol.t, tgol2.t, tgol3.t) described in High Performance Game of Life.

References

Updated: Added more references.

Replies are listed 'Best First'.
Re: More Betterer Game of Life
by AppleFritter (Vicar) on Sep 20, 2017 at 10:22 UTC

    That's really cool, excellent work. I'm almost surprised it became that much faster even in unoptimized form.

    For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best, this would have the added advantage of supporting all (classes of) CAs that lifelib supports. That said, having a pure-Perl implementation that will run anywhere that Perl will is obviously valuable as well: lifelib is tied to a specific architecture and a handful of instruction sets. And to say you're beating the existing CPAN solutions would be an understatement.

    (Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)

      For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best, this would have the added advantage of supporting all (classes of) CAs that lifelib supports. ... (Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)
      Thanks, I really like your idea of a XS wrapper around LifeLib. And yes, I'd like to get something onto CPAN later this year.

        Thanks, I really like your idea of a XS wrapper around LifeLib.

        Better yet, how about a Life module with different pluggable backends? Math::CellularAutomaton, say, using Math::CellularAutomaton::LifeLib, Math::CellularAutomaton::Organism or Math::CellularAutomaton::Organism_PP under the hood, depending on what's installed, what's supported by the current CPU architecture, and what is able to handle the CA the user wants to run.

Re: More Betterer Game of Life
by eyepopslikeamosquito (Chancellor) on Sep 22, 2017 at 22:55 UTC

    I've updated Organism.pm to automatically detect the integer size perl was built with and use 32 x 32 tiles for a 32-bit perl and 64 x 64 tiles for a 64-bit perl. You can override this by editing Organism.pm and manually setting $TILE_SIZE_FULL to 32 or 64. I've tested with an old 32-bit Perl 5.8.6 and it works for me.

    Updated Benchmark Results

    As you might expect, Organism.pm runs a bit slower (and uses more memory) with 32-bit ints -- but not by much.

    Version375K cells750K cells1.5 million cells3 million cells
    new Organism.pm (64 x 64 tiles)1 secs1 secs3 secs5 secs
    new Organism.pm (32 x 32 tiles)1 secs1 secs4 secs7 secs
    Organism.pm (Mario improvements)13 secs26 secs52 secs108 secs
    Organism.pm (Original)35 secs70 secs141 secs284 secs
    Game::Life::Infinite:Board37 secs96 secs273 secs905 secs

    As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

    • New Organism.pm (64 x 64 tiles): 700,000K - 1,100,000K (update: seems to vary)
    • New Organism.pm (32 x 32 tiles): 1,400,000K
    • Organism.pm (Original): 1,455,004K
    • Organism.pm (Mario improvements): 1,596,368K
    • Game::Life::Infinite:Board: 18,138,504K

    Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
    VersionLidka 30,000 ticks
    new Organism.pm (64 x 64 tiles)58 secs
    new Organism.pm (32 x 32 tiles)86 secs
    Organism.pm (Mario improvements)450 secs
    Organism.pm (Original)1635 secs
    Game::Life::Infinite:Board640 secs

    Updated Organism.pm follows.

      Porting this new Organism.pm back to C++ (as grid.h) proved interesting. I had to add a new third argument (niter) to the benchmark main get accurate timings! :)

      Updated Benchmark Results

      Update: After applying the two-at-a-time tick trick described here, the program was more than twice as fast, as shown below.

      Version375K cells750K cells1.5 million cells3 million cells
      new C++ grid.h (64 x 64 tiles) - two at a timetoo small to measure
      new C++ grid.h (64 x 64 tiles) - one at a time0.04 secs
      new Organism.pm (64 x 64 tiles)1 secs1 secs3 secs5 secs
      new Organism.pm (32 x 32 tiles)1 secs1 secs4 secs7 secs
      Organism.pm (Mario improvements)13 secs26 secs52 secs108 secs
      Organism.pm (Original)35 secs70 secs141 secs284 secs
      Game::Life::Infinite:Board37 secs96 secs273 secs905 secs

      As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

      • New C++ grid.h (64 x 64 tiles): 69,632K
      • New Organism.pm (64 x 64 tiles): 700,000K
      • New Organism.pm (32 x 32 tiles): 1,400,000K
      • Organism.pm (Original): 1,455,004K
      • Organism.pm (Mario improvements): 1,596,368K
      • Game::Life::Infinite:Board: 18,138,504K

      Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
      VersionLidka 30,000 ticks
      new C++ grid.h (64 x 64 tiles) - two at a time0.08 secs
      new C++ grid.h (64 x 64 tiles) - one at a time0.21 secs
      new Organism.pm (64 x 64 tiles) - two at a time19 secs
      new Organism.pm (32 x 32 tiles) - two at a time21 secs
      new Organism.pm (64 x 64 tiles) - one at a time55 secs
      new Organism.pm (32 x 32 tiles) - one at a time81 secs
      Organism.pm (Mario improvements)450 secs
      Organism.pm (Original)1635 secs
      Game::Life::Infinite:Board640 secs

      Updated grid.h and tbench1.cpp follow.

      I've added a new twoticks method to advance the universe two ticks at a time. Advancing the universe by two generations is easy to implement and offers significant performance advantages (as pointed out by apg) because 2-periodic "ash" (e.g. blinker) is common in game of life, and stepping two ticks at a time automatically detects it.

      Further improvements may be possible by adding more sophisticated history detection.

      Updated Benchmark Results

      Benchmark timings running the 3 million cell blinker for both two ticks and one hundred ticks:
      Version3 million cell blinker, 2 ticks3 million cell blinker, 100 ticks
      new Organism.pm (64 x 64 tiles) - one at a time5 secs256 secs
      new Organism.pm (64 x 64 tiles) - two at a time3 secs3 secs!!!

      When running the admittedly artificial blinker test two at a time, notice that the pattern does not change at all! ... so all tiles are marked as unchanged and no further calculations are performed!

      Benchmark timings running AppleFritter's Lidka test for 30,000 ticks:
      VersionLidka 30,000 ticks
      new Organism.pm (64 x 64 tiles) - two at a time17 secs
      new Organism.pm (32 x 32 tiles) - two at a time18 secs
      new Organism.pm (64 x 64 tiles) - one at a time49 secs
      new Organism.pm (32 x 32 tiles) - one at a time72 secs
      Organism.pm (Mario improvements)450 secs
      Organism.pm (Original)1635 secs
      Game::Life::Infinite:Board640 secs
      Update: these timings were improved by a second or two after making some minor Organism.pm code improvements below (based on Devel::NYTProf profiling). Also added "use integer". Also tested with ancient 32-bit Perl 5.8.4.

      Updated Organism.pm follows.