Without good design, good algorithms, and complete understanding of the
program's operation, your carefully optimized code will amount to one of
mankind's least fruitful creations - a fast slow program.
-- Michael Abrash
Don't diddle code to make it faster -- find a better algorithm
-- The Elements of Programming Style
In High Performance Game of Life, I chose a very simple design, storing all live cells
in a single set.
Though pleasing for its simplicity and unboundedness,
its drawback is that counting live neighbours becomes a hash lookup,
a chronic performance bottleneck. What to do?
Rather than spending more time optimizing my original design --
thus creating a "fast slow program" --
I researched the domain, learning of many different ways to do it.
From the many possible approaches, I chose the simplest one I could find
that looked interesting and enjoyable, and implemented it in pure Perl.
To try to keep my initial attempt short and understandable, I started with a
simplified version based on the the brilliant works of Adam P. Goucher (apg),
tiling the universe with 64 x 64 tiles
in a conventional way, each tile having eight neighbours.
Note that this was chosen for simplicity; more efficient schemes
are available, such as the "brick wall" tiling used by Goucher
in later versions.
For background on the concept of breaking the game of life universe
into overlapping tiles, see this description of Life128 and vlife.
My code is loosely based on apgnano (version 2) but advances
one tick at a time (rather than two at a time, as apg did)
and does not attempt to use universe history.
Fair warning though. Despite striving to keep the code simple and short,
it's way more complex than my original, Organism.pm swelling
from 66 lines of code to 414.
Benchmark Results
I've updated the benchmark results given in my original node.
As you can see, even this simplified version, with no attempts made at code
optimization, is already an order of magnitude faster than the
optimized version of the original.
Version | 375K cells | 750K cells | 1.5 million cells | 3 million cells |
new Organism.pm (see below) | 1 secs | 1 secs | 3
secs | 5 secs |
Organism.pm (Mario improvements) | 13 secs | 26 secs | 52 secs | 108 secs |
Organism.pm (Original) | 35 secs | 70 secs | 141 secs | 284 secs |
Game::Life::Infinite:Board | 37 secs | 96 secs | 273 secs | 905 secs |
As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:
- New Organism.pm (see below): 700,000K - 1,100,000K (update: seems to vary)
- Organism.pm (Original): 1,455,004K
- Organism.pm (Mario improvements): 1,596,368K
- Game::Life::Infinite:Board: 18,138,504K
Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
Version | Lidka 30,000 ticks |
new Organism.pm (see below) | 58 secs |
Organism.pm (Mario improvements) | 450 secs |
Organism.pm (Original) | 1635 secs |
Game::Life::Infinite:Board | 640 secs |
Update
Improving My Initial Attempt
There is certainly plenty of scope for improving my initial attempt.
After all, I have not attempted any optimizations at all, just tried to implement ideas from
apg's C++/assembler programs in a pure Perl form in a simple and clear way.
While all feedback is welcome, I'm especially eager to see:
- Refactorings that make the Perl code shorter, clearer, more idiomatic.
- Performance optimizations.
- Explanations of (and alternatives to) the bit-twiddling code below, specifically the bit operations in st64_tiletick() below I find especially hard to follow.
- Bug fixes. I was shocked when my code worked the second time I ran it - just one coding blunder was corrected before my new Organism.pm passed tgol.t, tgol2.t, tgol3.t and the 30,000 lidka test! So I suspect there may be more bugs lurking in this brand new implementation.
As a minimum, any code refactorings should be tested by running
tgol.t and tgol3.t from my original node.
Note that this new version of Organism.pm is (or should be) 100% interface compatible with my original.
New Organism.pm
Finally, here is my new and improved Organism.pm (update: the latest and best Organism.pm can be found here):
package Organism;
use strict;
# Note: for this module, perl must be built with 64-bit integers
# use Config;
# $Config{ivsize} < 8 and die "perl ivsize=$Config{ivsize} is too smal
+l";
# ----------------------------------------------------------------
# The Universe is modelled as a set of overlapping tiles.
# For background, see http://conwaylife.com/wiki/Life128_and_vlife
# We use a simple scheme of 64 x 64 tiles (60 x 60 core) with
# conventional tiling (each tile has eight neighbours).
# Note that this was chosen for simplicity; more efficient schemes
# are available, such as the "brick wall" tiling used by Goucher
# in later versions (apgmera, version 3)
#
# This code is loosely based on apgnano (version 2) but advances
# one tick at a time (rather than advancing two at a time)
# and does not attempt to use universe history.
# This was to keep the implementation short.
#
# ----------------------------------------------------------------
# SQUARE TILE
my $BORDER_WIDTH = 2;
my $BORDER_WIDTH_P1 = $BORDER_WIDTH + 1;
my $TILE_SIZE_FULL = 64;
my $TILE_SIZE_FULL_M1 = $TILE_SIZE_FULL - 1;
my $TILE_SIZE_FULL_MB = $TILE_SIZE_FULL - $BORDER_WIDTH;
my $TILE_SIZE_CORE = $TILE_SIZE_FULL - 2 * $BORDER_WIDTH;
my $TILE_SIZE_CORE_P1 = $TILE_SIZE_CORE + 1;
my $MIDDLE60 = 0x3ffffffffffffffc;
my $LEFT62 = 0xfffffffffffffffc;
my $RIGHT62 = 0x3fffffffffffffff;
my $OUTER4 = 0xc000000000000003;
my $LEFTMIDDLE = 0x3000000000000000;
my $RIGHTMIDDLE = 0x000000000000000c;
# Neighbours are numbered clockwise starting with the one directly abo
+ve
my $NUM_NEIGH = 8;
my $NEIGH_TOP = 0;
my $NEIGH_TOP_RIGHT = 1;
my $NEIGH_RIGHT = 2;
my $NEIGH_BOTTOM_RIGHT = 3;
my $NEIGH_BOTTOM = 4;
my $NEIGH_BOTTOM_LEFT = 5;
my $NEIGH_LEFT = 6;
my $NEIGH_TOP_LEFT = 7;
# Note that the i ^ 4 trick sets i to the opposite one, that is:
# 0 > 4 (TOP > BOTTOM)
# 1 > 5 (TOP RIGHT > BOTTOM LEFT)
# 2 > 6 (RIGHT > LEFT)
# 3 > 7 (BOTTOM RIGHT > TOP LEFT)
# 4 > 0 (BOTTOM > TOP)
# 5 > 1 (BOTTOM LEFT > TOP RIGHT)
# 6 > 2 (LEFT > RIGHT)
# 7 > 3 (TOP LEFT > BOTTOM RIGHT)
# The functions starting with st64_ manipulate
# a simple 64 x 64 square tile bitmap.
# Note that x and y must be in 0..63 range.
# $row is a ref to an array of 64 unsigned 64-bit ints.
# The value in row[] bitmap is 0 (dead) or 1 (alive).
sub st64_getcellval {
my ($row, $x, $y) = @_;
my $mk = 1 << (63 - $x);
return $row->[$y] & $mk ? 1 : 0;
}
sub st64_setcellval {
my ($row, $x, $y, $v) = @_;
my $mk = 1 << (63 - $x);
if ($v) { $row->[$y] |= $mk; } else { $row->[$y] &= ~$mk; }
}
sub st64_insertcells {
my $row = shift;
for my $r (@_) { st64_setcellval($row, $r->[0], $r->[1], 1) }
}
# Used for verification and unit testing of st64_tiletick
sub st64_getlivecells {
my $row = shift;
my @cells;
for my $y (0 .. 63) {
next unless $row->[$y];
for my $x (0 .. 63) {
st64_getcellval($row, $x, $y) and push @cells, [ $x, $y ];
}
}
sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells;
}
# Advance the interior of square tile by one tick.
# Return a two element list:
# [0] : 1 if square tile changed, else 0.
# [1] : neighbour flags (see NEIGH flags above)
# indicates which neighbours need to be updated
sub st64_tiletick {
my $row = shift;
my $neigh = 0;
my $bigdiff = 0;
my @carry = (0) x 64;
my @parity = (0) x 64;
my @diff = (0) x 64;
my ( $aa, $bb, $p, $q, $r, $s, $bit0, $bit1, $bit2 );
my $top = 0;
my $bottom = $TILE_SIZE_FULL_M1;
while ($top < $TILE_SIZE_FULL_M1 && $row->[$top] == 0) { ++$top }
while ($bottom > 0 && $row->[$bottom] == 0) { --$bottom }
if ($top > $bottom) { return ( 0, $neigh ) }
for my $i ($top .. $bottom) {
$aa = $row->[$i] >> 1;
$bb = $row->[$i] << 1;
$q = $aa ^ $bb;
$parity[$i] = $q ^ $row->[$i];
$carry[$i] = ($q & $row->[$i]) | ($aa & $bb);
}
--$top; ++$bottom;
if ($top < 1) { $top = 1 }
if ($bottom > $TILE_SIZE_FULL_MB) { $bottom = $TILE_SIZE_FULL_MB }
for my $i ($top .. $bottom) {
$aa = $parity[$i-1];
$bb = $parity[$i+1];
$q = $aa ^ $bb;
$bit0 = $q ^ $parity[$i];
$r = ($q & $parity[$i]) | ($aa & $bb);
$aa = $carry[$i-1];
$bb = $carry[$i+1];
$q = $aa ^ $bb;
$p = $q ^ $carry[$i];
$s = ($q & $carry[$i]) | ($aa & $bb);
$bit1 = $p ^ $r;
$bit2 = $s ^ ($p & $r);
$p = ($bit0 & $bit1 & ~$bit2) | ($bit2 & ~$bit1 & ~$bit0 & $row-
+>[$i]);
$diff[$i] = ($row->[$i] ^ $p) & $MIDDLE60;
$bigdiff |= $diff[$i];
$row->[$i] = ($p & $MIDDLE60) | ($row->[$i] & ~$MIDDLE60);
}
$aa = $diff[$BORDER_WIDTH] | $diff[$BORDER_WIDTH_P1];
$bb = $diff[$TILE_SIZE_CORE] | $diff[$TILE_SIZE_CORE_P1];
if ($bigdiff) {
if ($bigdiff & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_LEFT }
if ($bigdiff & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_RIGHT }
}
if ($aa) {
$neigh |= 1 << $NEIGH_TOP;
if ($aa & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_LEFT }
if ($aa & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_RIGHT }
}
if ($bb) {
$neigh |= 1 << $NEIGH_BOTTOM;
if ($bb & $LEFTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_LEFT }
if ($bb & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_RIGHT }
}
my $changed = ($bigdiff != 0) ? 1 : 0;
return ( $changed, $neigh );
}
# Population count (https://en.wikipedia.org/wiki/Hamming_weight)
# See also GCC built-in: __builtin_popcount
sub popcount {
my $x = shift;
my $count;
for ($count = 0; $x; ++$count) { $x &= $x - 1 }
return $count;
}
# ----------------------------------------------------------------
# ORGANISM
sub count {
my $self = shift;
my $tiles = $self->{Tiles};
my $cnt = 0;
for my $k (keys %{$tiles}) {
my $row = $tiles->{$k}->{Row};
for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
next unless $row->[$y];
$cnt += popcount($row->[$y] & $MIDDLE60);
}
}
return $cnt;
}
# Input a list of [ x, y ] coords
sub insert_cells {
my $self = shift;
for my $r (@_) { $self->setcell($r->[0], $r->[1], 1) }
}
# Used for verification and testing the state of the organism
sub get_live_cells {
my $self = shift;
my $tiles = $self->{Tiles};
my @cells;
for my $k (keys %{$tiles}) {
my $sqt = $tiles->{$k};
for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
next unless $sqt->{Row}->[$y];
for my $x ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
if (st64_getcellval($sqt->{Row}, $x, $y)) {
push @cells,
[$TILE_SIZE_CORE * $sqt->{Tx} + $x - $BORDER_WIDTH,
$TILE_SIZE_CORE * $sqt->{Ty} + $y - $BORDER_WIDTH];
}
}
}
}
sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells;
}
sub get_neighbour {
my $self = shift;
my $sqt = shift;
my $i = shift;
unless ($sqt->{Neighbours}->[$i]) {
my $x = $sqt->{Tx};
my $y = $sqt->{Ty};
if ($i >= $NEIGH_TOP_RIGHT && $i <= $NEIGH_BOTTOM_RIGHT) { ++
+$x }
if ($i >= $NEIGH_BOTTOM_RIGHT && $i <= $NEIGH_BOTTOM_LEFT) { ++
+$y }
if ($i >= $NEIGH_BOTTOM_LEFT && $i <= $NEIGH_TOP_LEFT) { --
+$x }
if ($i == $NEIGH_TOP_LEFT || $i <= $NEIGH_TOP_RIGHT) { --
+$y }
my $tiles = $self->{Tiles};
my $k = pack 'i2', $x, $y;
unless (exists $tiles->{$k}) {
$tiles->{$k} = {
Row => [ (0) x 64 ],
Tx => $x,
Ty => $y,
Updateflags => 0,
Neighbours => [],
};
}
$sqt->{Neighbours}->[$i] = $tiles->{$k};
$sqt->{Neighbours}->[$i]->{Tx} = $x;
$sqt->{Neighbours}->[$i]->{Ty} = $y;
}
return $sqt->{Neighbours}->[$i];
}
# Alert the neighbour that its neighbour (the original tile) has chang
+ed
sub update_neighbour {
my $self = shift;
my $sqt = shift;
my $i = shift;
if ($self->get_neighbour($sqt, $i)->{Updateflags} == 0) {
push @{$self->{Modified}}, $self->get_neighbour($sqt, $i);
}
$self->get_neighbour($sqt, $i)->{Updateflags} |= 1 << ($i ^ 4);
}
# Update the relevant portions of the boundary (a 64-by-64 square
# with the central 60-by-60 square removed) by copying data from
# the interiors (the 60-by-60 central squares) of the neighbours.
# Only perform this copying when necessary.
sub update_boundary {
my $self = shift;
my $sqt = shift;
my $temp_modified = $self->{TempModified};
if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_TOP);
$sqt->{Row}->[0] = ($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) |
+($sqt->{Row}->[0] & $OUTER4);
$sqt->{Row}->[1] = ($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60)
+ | ($sqt->{Row}->[1] & $OUTER4);
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_LEFT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_TOP_LEFT);
$sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) <
+< $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $RIGHT62);
$sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60
+) << $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $RIGHT62);
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_RIGHT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_TOP_RIGHT);
$sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) >
+> $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $LEFT62);
$sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60
+) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $LEFT62);
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM);
$sqt->{Row}->[$TILE_SIZE_FULL_MB] = ($n->{Row}->[$BORDER_WIDTH]
+& $MIDDLE60) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB] & $OUTER4);
$sqt->{Row}->[$TILE_SIZE_FULL_M1] = ($n->{Row}->[3] & $MIDDLE60)
+ | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $OUTER4);
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_LEFT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_LEFT);
$sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH]
+ & $MIDDLE60) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB
+] & $RIGHT62);
$sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60
+) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $RIGHT62
+);
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_RIGHT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_RIGHT);
$sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH]
+ & $MIDDLE60) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB
+] & $LEFT62);
$sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60
+) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $LEFT62)
+;
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_LEFT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_LEFT);
for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) {
$sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) << $TILE_S
+IZE_CORE) | ($sqt->{Row}->[$i] & $RIGHT62);
}
}
if ( $sqt->{Updateflags} & (1 << $NEIGH_RIGHT) ) {
my $n = $self->get_neighbour($sqt, $NEIGH_RIGHT);
for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) {
$sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) >> $TILE_S
+IZE_CORE) | ($sqt->{Row}->[$i] & $LEFT62);
}
}
$sqt->{Updateflags} = 0;
push @{$temp_modified}, $sqt;
}
# Advance the interior of the tile by one generation.
sub update_tile {
my $self = shift;
my $modified = $self->{Modified};
my $sqt = shift;
my ($update_flag, $neigh) = st64_tiletick($sqt->{Row});
if ($update_flag) {
if ($sqt->{Updateflags} == 0) { push @{$modified}, $sqt }
$sqt->{Updateflags} |= 1 << $NUM_NEIGH;
}
for my $i (0 .. $NUM_NEIGH - 1) {
if ($neigh & (1 << $i)) { $self->update_neighbour($sqt, $i) }
}
}
sub tick {
my $self = shift;
my $modified = $self->{Modified};
my $temp_modified = $self->{TempModified};
while (@{$modified}) {
$self->update_boundary(pop @{$modified});
}
while (@{$temp_modified}) {
$self->update_tile(pop @{$temp_modified});
}
}
sub updatecell {
my $self = shift;
my $sqt = shift;
my $x = shift;
my $y = shift;
if ($sqt->{Updateflags} == 0) { push @{$self->{Modified}}, $sqt }
$sqt->{Updateflags} |= 1 << $NUM_NEIGH;
if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_
+TOP) }
if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_B
+OTTOM) }
if ($x <= $BORDER_WIDTH_P1) {
$self->update_neighbour($sqt, $NEIGH_LEFT);
if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI
+GH_TOP_LEFT) }
if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG
+H_BOTTOM_LEFT) }
}
if ($x >= $TILE_SIZE_CORE) {
$self->update_neighbour($sqt, $NEIGH_RIGHT);
if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI
+GH_TOP_RIGHT) }
if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG
+H_BOTTOM_RIGHT) }
}
}
sub setcell {
my $self = shift;
my $x = shift;
my $y = shift;
my $state = shift;
my $tiles = $self->{Tiles};
my $ox = $x % $TILE_SIZE_CORE;
my $oy = $y % $TILE_SIZE_CORE;
if ($ox < 0) { $ox += $TILE_SIZE_CORE }
if ($oy < 0) { $oy += $TILE_SIZE_CORE }
my $tx = ($x - $ox) / $TILE_SIZE_CORE;
my $ty = ($y - $oy) / $TILE_SIZE_CORE;
my $k = pack 'i2', $tx, $ty;
unless (exists $tiles->{$k}) {
$tiles->{$k} = {
Row => [ (0) x 64 ],
Tx => $tx,
Ty => $ty,
Updateflags => 0,
Neighbours => [],
};
}
my $xx = $ox + $BORDER_WIDTH;
my $yy = $oy + $BORDER_WIDTH;
st64_setcellval($tiles->{$k}->{Row}, $xx, $yy, $state);
$self->updatecell($tiles->{$k}, $xx, $yy);
}
sub getcellval {
my $self = shift;
my $x = shift;
my $y = shift;
my $tiles = $self->{Tiles};
my $ox = $x % $TILE_SIZE_CORE;
my $oy = $y % $TILE_SIZE_CORE;
if ($ox < 0) { $ox += $TILE_SIZE_CORE }
if ($oy < 0) { $oy += $TILE_SIZE_CORE }
my $tx = ($x - $ox) / $TILE_SIZE_CORE;
my $ty = ($y - $oy) / $TILE_SIZE_CORE;
my $k = pack 'i2', $tx, $ty;
exists $tiles->{$k} or return 0;
return st64_getcellval( $tiles->{$k}->{Row},
$ox + $BORDER_WIDTH,
$oy + $BORDER_WIDTH );
}
sub new {
my $class = shift;
my %init_self = ( Tiles => {}, Modified => [], TempModified => [] )
+;
bless \%init_self, $class;
}
1;
Note that this new implementation passes all the same tests
(tgol.t, tgol2.t, tgol3.t) described in High Performance Game of Life.
References
Updated: Added more references.
Re: More Betterer Game of Life
by AppleFritter (Vicar) on Sep 20, 2017 at 10:22 UTC
|
That's really cool, excellent work. I'm almost surprised it became that much faster even in unoptimized form.
For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best, this would have the added advantage of supporting all (classes of) CAs that lifelib supports. That said, having a pure-Perl implementation that will run anywhere that Perl will is obviously valuable as well: lifelib is tied to a specific architecture and a handful of instruction sets. And to say you're beating the existing CPAN solutions would be an understatement.
(Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)
| [reply] |
|
For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best,
this would have the added advantage of supporting all (classes of) CAs that lifelib supports. ...
(Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)
Thanks, I really like your idea of a XS wrapper around LifeLib.
And yes, I'd like to get something onto CPAN later this year.
| [reply] |
|
Thanks, I really like your idea of a XS wrapper around LifeLib.
Better yet, how about a Life module with different pluggable backends? Math::CellularAutomaton, say, using Math::CellularAutomaton::LifeLib, Math::CellularAutomaton::Organism or Math::CellularAutomaton::Organism_PP under the hood, depending on what's installed, what's supported by the current CPU architecture, and what is able to handle the CA the user wants to run.
| [reply] |
Re: More Betterer Game of Life
by eyepopslikeamosquito (Archbishop) on Sep 22, 2017 at 22:55 UTC
|
I've updated Organism.pm to automatically detect the integer
size perl was built with and use 32 x 32 tiles for a 32-bit
perl and 64 x 64 tiles for a 64-bit perl.
You can override this by editing Organism.pm
and manually setting $TILE_SIZE_FULL to 32 or 64.
I've tested with an old 32-bit Perl 5.8.6 and it works for me.
Updated Benchmark Results
As you might expect, Organism.pm runs a bit slower (and uses more memory)
with 32-bit ints -- but not by much.
Version | 375K cells | 750K cells | 1.5 million cells | 3 million cells |
new Organism.pm (64 x 64 tiles) | 1 secs | 1 secs | 3 secs | 5 secs |
new Organism.pm (32 x 32 tiles) | 1 secs | 1 secs | 4 secs | 7 secs |
Organism.pm (Mario improvements) | 13 secs | 26 secs | 52 secs | 108 secs |
Organism.pm (Original) | 35 secs | 70 secs | 141 secs | 284 secs |
Game::Life::Infinite:Board | 37 secs | 96 secs | 273 secs | 905 secs |
As for memory use, the maximum Windows Private Bytes used for the three
million cell case by each process was:
- New Organism.pm (64 x 64 tiles): 700,000K - 1,100,000K (update: seems to vary)
- New Organism.pm (32 x 32 tiles): 1,400,000K
- Organism.pm (Original): 1,455,004K
- Organism.pm (Mario improvements): 1,596,368K
- Game::Life::Infinite:Board: 18,138,504K
Benchmark timings running AppleFritter's Lidka test for 30,000
ticks were:
Version | Lidka 30,000 ticks |
new Organism.pm (64 x 64 tiles) | 58 secs |
new Organism.pm (32 x 32 tiles) | 86 secs |
Organism.pm (Mario improvements) | 450 secs |
Organism.pm (Original) | 1635 secs |
Game::Life::Infinite:Board | 640 secs |
Updated Organism.pm follows.
| [reply] [d/l] [select] |
|
Porting this new Organism.pm back to C++ (as grid.h) proved interesting.
I had to add a new third argument (niter) to the benchmark main get accurate timings! :)
Updated Benchmark Results
Update: After applying the two-at-a-time tick trick described here, the program was more than twice as fast, as shown below.
Version | 375K cells | 750K cells | 1.5 million cells | 3 million cells |
new C++ grid.h (64 x 64 tiles) - two at a time | | | | too small to measure |
new C++ grid.h (64 x 64 tiles) - one at a time | | | | 0.04 secs |
new Organism.pm (64 x 64 tiles) | 1 secs | 1 secs | 3 secs | 5 secs |
new Organism.pm (32 x 32 tiles) | 1 secs | 1 secs | 4 secs | 7 secs |
Organism.pm (Mario improvements) | 13 secs | 26 secs | 52 secs | 108 secs |
Organism.pm (Original) | 35 secs | 70 secs | 141 secs | 284 secs |
Game::Life::Infinite::Board | 37 secs | 96 secs | 273 secs | 905 secs |
As for memory use, the maximum Windows Private Bytes used for the three
million cell case by each process was:
- New C++ grid.h (64 x 64 tiles): 69,632K
- C++ Organism.h (Original): 517,340K
- New Organism.pm (64 x 64 tiles): 700,000K
- New Organism.pm (32 x 32 tiles): 1,400,000K
- Organism.pm (Original): 1,455,004K
- Organism.pm (Mario improvements): 1,596,368K
- Game::Life::Infinite::Board: 18,138,504K
Benchmark timings running AppleFritter's Lidka test for 30,000
ticks were:
Version | Lidka 30,000 ticks |
new C++ grid.h (64 x 64 tiles) - two at a time | 0.08 secs |
new C++ grid.h (64 x 64 tiles) - one at a time | 0.21 secs |
C++ Organism.h (Original) | 36 secs |
new Organism.pm (64 x 64 tiles) - two at a time | 19 secs |
new Organism.pm (32 x 32 tiles) - two at a time | 21 secs |
new Organism.pm (64 x 64 tiles) - one at a time | 55 secs |
new Organism.pm (32 x 32 tiles) - one at a time | 81 secs |
Organism.pm (Mario improvements) | 450 secs |
Organism.pm (Original) | 1635 secs |
Game::Life::Infinite::Board | 640 secs |
Update: The file lidka_106.lif:
#Life 1.06
-3 -7
-4 -6
-2 -6
-3 -5
4 3
2 4
4 4
1 5
2 5
4 5
0 7
1 7
2 7
Updated grid.h and tbench1.cpp follow. Note: This node contains the latest and best version of the C++ GOL code.
| [reply] [d/l] [select] |
|
| [reply] |
|
|
I've added a new twoticks method to advance the universe two ticks at a time.
Advancing the universe by two generations is easy to implement and
offers significant performance advantages (as pointed out by apg)
because 2-periodic "ash" (e.g. blinker) is common in game of life,
and stepping two ticks at a time automatically detects it.
Further improvements may be possible by adding more sophisticated history detection.
Updated Benchmark Results
Benchmark timings running the 3 million cell blinker for both two ticks and one hundred ticks:
Version | 3 million cell blinker, 2 ticks | 3 million cell blinker, 100 ticks |
new Organism.pm (64 x 64 tiles) - one at a time | 5 secs | 256 secs |
new Organism.pm (64 x 64 tiles) - two at a time | 3 secs | 3 secs!!! |
When running the admittedly artificial blinker test two at a time,
notice that the pattern does not change at all! ... so all tiles are
marked as unchanged and no further calculations are performed!
Benchmark timings running AppleFritter's Lidka test for 30,000
ticks:
Version | Lidka 30,000 ticks |
new Organism.pm (64 x 64 tiles) - two at a time | 17 secs |
new Organism.pm (32 x 32 tiles) - two at a time | 18 secs |
new Organism.pm (64 x 64 tiles) - one at a time | 49 secs |
new Organism.pm (32 x 32 tiles) - one at a time | 72 secs |
old Organism.pm (Mario improvements) | 450 secs |
old Organism.pm (Original) | 1635 secs |
Game::Life::Infinite::Board | 640 secs |
Note that these timings were improved by a second or two by some minor Organism.pm code tweaks (based on Devel::NYTProf profiling) - also added "use integer" and tested with ancient 32-bit Perl 5.8.4.
Instructively, tweaking the code, via a long series of micro-optimizations,
reduced the running time from 1635 secs to 450 secs (i.e. 3.6 times faster),
while finding a better algorithm reduced it from 450 secs to 17 secs (26.5 times faster).
Updated Organism.pm follows. Note: This node contains the latest and best version of the Perl GOL code.
| [reply] [d/l] [select] |
Re: More Betterer Game of Life
by AppleFritter (Vicar) on Jun 22, 2018 at 14:20 UTC
|
BTW, this is a rather late reply to an old(er) node, but here's some recent work by Tom Rokicki comparing different Life algorithms on a number of different (computationally intensive) patterns. Just something to peruse if you're interested in state-of-the-art Life algorithms!
| [reply] |
|
|