 No such thing as a small change PerlMonks

### High Performance Game of Life

by eyepopslikeamosquito (Bishop)
 on Aug 11, 2017 at 12:49 UTC Need Help??

The universe of Conway's Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive or dead. Every cell interacts with its eight neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

1. Any live cell with fewer than two live neighbours dies, as if caused by under-population.
2. Any live cell with two or three live neighbours lives on to the next generation.
3. Any live cell with more than three live neighbours dies, as if by overcrowding.
4. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.
The initial pattern constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell in the seed - births and deaths occur simultaneously, and the discrete moment at which this happens is sometimes called a tick (in other words, each generation is a pure function of the preceding one). The rules continue to be applied repeatedly to create further generations.

-- Conway's Game of Life (wikipedia)

I've recently been forced to become acquainted with Conway's Game of Life after being asked to assess various implementations of it submitted by job applicants. Rather than a chore, I actually found it very interesting. Not the GUI stuff, just the guts of the code. How to implement it cleanly, concisely, efficiently.

I wrote a couple of solutions myself, in both Perl and C++11, which I present below. Though still a game of life novice, I've spent quite a bit of time on these solutions already - including some profiling and performance optimization of my C++ solution. My Perl solution is essentially a translation of the one I wrote first in C++.

Having spent time analysing this problem, I'm interested to see how others go about it.

The Challenge

If you want to test yourself, and have a few hours to spare, you might like to have a go (without reading my solutions below) at implementing the above algorithm as a simple Organism class with methods:

• insert_cells() - insert a list of live cells (x and y coordinates) into the Organism. This is used to set the starting state of the Organism.
• count() - return the number of live cells in the Organism. This is used for verification by unit tests.
• get_live_cells() - return the list of all cells currently alive in the Organism. This is also used for verification.
• tick() - create the next generation of the Organism by applying the four rules above. This is where most of the hard work and performance challenges are.
For this challenge, you don't need to be truly infinite, but you must handle cells with x and y co-ordinates in the -2 GB to 2 GB range (i.e. 32-bit signed integer x and y co-ordinates). In the interests of keeping the challenge smallish, you don't need to consider graphics, customizations, variations, just implement the four basic rules above.

Performance

I also have a special interest in high performance computing, so hope to learn more about performance and scalability by the feedback I get from this node. I've already sought performance advice related to this challenge in Fastest way to lookup a point in a set.

Update: A much faster, and much more complex, implementation was done later in a follow up node: More Betterer Game of Life

C++ Solution

Let's start with the Organism class I wrote first, in portable C++11 (tested on both Linux and Windows) - along with a simple performance benchmark program. My Perl solution was derived from this one, so we later compare the performance of the similar C++ and Perl solutions.

```// Organism.h
// Simple implementation of Conway game of life in standard C++11.

#ifndef ORGANISM_H
#define ORGANISM_H

#include <cstddef>
#include <cstdint>
#include <vector>
#include <unordered_set>
#include <algorithm>

static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne
+ed a 64-bit compile");

// CELL
typedef int32_t  cell_coord_type;
typedef uint64_t cell_whole_type;
typedef int64_t  cell_signed_whole_type;

#define XX(w)    ((cell_coord_type)(w))
#define YY(w)    ((cell_coord_type)(((w) >> 32) & 0xFFFFFFFF))
#define WW(x, y) (((cell_signed_whole_type)(y) << 32) | ((cell_signed_
+whole_type)(x) & 0xFFFFFFFF))

struct Cell {
cell_coord_type x;
cell_coord_type y;
Cell(cell_coord_type x_, cell_coord_type y_) : x(x_), y(y_) {}
Cell(cell_whole_type w) : x(XX(w)), y(YY(w)) {}
};
inline bool operator<(const Cell& lhs, const Cell& rhs) {
if (lhs.x < rhs.x) return true;
if (lhs.x > rhs.x) return false;
return lhs.y < rhs.y;
// or one-liner: return lhs.x < rhs.x || (lhs.x == rhs.x && lhs.y <
+ rhs.y);
}
inline bool operator>( const Cell& lhs, const Cell& rhs) { return rhs
+< lhs; }
inline bool operator<=(const Cell& lhs, const Cell& rhs) { return !(lh
+s > rhs); }
inline bool operator>=(const Cell& lhs, const Cell& rhs) { return !(lh
+s < rhs); }
inline bool operator==(const Cell& lhs, const Cell& rhs) { return lhs.
+x == rhs.x && lhs.y == rhs.y; }
inline bool operator!=(const Cell& lhs, const Cell& rhs) { return !(lh
+s == rhs); }

using cell_list_type = std::vector<Cell>;
struct CellHasher { size_t operator()(cell_whole_type w) const { retur
+n w; }  };
using cell_lookup_type = std::unordered_set<cell_whole_type, CellHashe
+r>;

// ORGANISM
class Organism {
public:
Organism(size_t ncell = 1000) { cells_m.reserve(ncell); }
size_t count() const { return cells_m.size(); }
size_t is_alive(cell_whole_type w) const { return cells_m.find(w) !
+= cells_m.end(); }
size_t is_dead(cell_whole_type w)  const { return cells_m.find(w) =
+= cells_m.end(); }

// Used to initialize the starting state of the organism
size_t insert_cells(const cell_list_type& cells) {
size_t n_inserted = 0;
for (const auto& c : cells) {
if (cells_m.insert(WW(c.x, c.y)).second) ++n_inserted;
}
return n_inserted;
}

// Used for verification and testing the state of the organism
cell_list_type get_live_cells() const {
cell_list_type vec_items(cells_m.cbegin(), cells_m.cend());
std::sort(vec_items.begin(), vec_items.end());
return vec_items;
}

// Get the number of live neighbours surrounding a cell
size_t get_num_live(cell_coord_type x, cell_coord_type y) const {
return is_alive( WW(x-1, y-1) )
+ is_alive( WW(x-1, y  ) )
+ is_alive( WW(x-1, y+1) )
+ is_alive( WW(x  , y-1) )
+ is_alive( WW(x  , y+1) )
+ is_alive( WW(x+1, y-1) )
+ is_alive( WW(x+1, y  ) )
+ is_alive( WW(x+1, y+1) );
}

// Return the number of dead cells surrounding a cell
// The cells themselves are returned in z
size_t get_dead_cells(cell_coord_type x, cell_coord_type y, cell_wh
+ole_type* z) const {
size_t n = 0;
if (is_dead(WW(x-1, y-1))) z[n++] = WW(x-1, y-1);
if (is_dead(WW(x-1, y  ))) z[n++] = WW(x-1, y  );
if (is_dead(WW(x-1, y+1))) z[n++] = WW(x-1, y+1);
if (is_dead(WW(x  , y-1))) z[n++] = WW(  x, y-1);
if (is_dead(WW(x  , y+1))) z[n++] = WW(  x, y+1);
if (is_dead(WW(x+1, y-1))) z[n++] = WW(x+1, y-1);
if (is_dead(WW(x+1, y  ))) z[n++] = WW(x+1, y  );
if (is_dead(WW(x+1, y+1))) z[n++] = WW(x+1, y+1);
return n;
}

void tick() {
cell_lookup_type new_cells;
size_t ncells = cells_m.size();
new_cells.reserve(ncells + ncells/4);

// Stores the (up to 8) dead neighbour cells surrounding a cell
cell_whole_type zcells;

for (const auto& c : cells_m) {
// Get the (up to 8) dead cells surrounding the cell

// Note: next line equivalent to nlive == 2 || nlive == 3

for (size_t i = 0; i < ndead; ++i) {
if (get_num_live(XX(zcells[i]), YY(zcells[i])) == 3) new_c
+ells.insert(zcells[i]);
}
}

cells_m = std::move(new_cells);
}

private:
cell_lookup_type cells_m;
};

#endif /* ORGANISM_H */

In case you're interested, switching from std::set to std::unordered_set, along with adding the custom CellHasher above, gave a huge performance boost - after profiling indicated that cell lookup was a performance hot spot. While using a single set of live cells is pleasing for its simplicity and unboundedness, its drawback is that counting live neighbours becomes a hash lookup, which is a chronic performance bottleneck.

I'm sure plenty of further performance improvement opportunites are waiting, for example:

• Write a custom allocator for std::unordered_set (allocate arena on stack if possible)
• Replace std::unordered_set with a custom sparse set of integers
• Sort/batch lookups
• Avoid checking the same dead cell more than once (e.g. create a set union of all dead cells)
• Find a better organism data structure: speed up checking cell state and counting neighbours (e.g. via a bitmap), keep track of which zones are active to save time by not updating inactive zones (a cell that did not change at the last time step, and none of whose neighbours changed, is guaranteed not to change at the current time step as well), detect repetitive patterns (e.g. blinker), ... Update: this has kinda been done now in More Betterer Game of Life.
• Memoization, cache optimization, compiler intrinsics and other tricks discussed in this node
• Intel TBB (Thread Building Blocks)
• OpenCL (GPGPU)
Further suggestions welcome.

A simple benchmark program using the blinker oscillator:

```// tbench1.cpp. Simple benchmark of Organism class.
// g++ compile on Linux:
//   g++ -o tbench1 -std=c++11 -Wall -O3 tbench1.cpp
// Note: the g++ command above also works with mingw C++ compiler (htt
+ps://sourceforge.net/projects/mingw-w64)
// that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex
+e)
// MSVC 2017 compile on Windows, start a 64-bit compiler cmdline via:
//   Start > AllApps > Visual Studio 2017 > x64 Native Tools Command P
+rompt for VS2017
// then:
//   cl /W3 /MT /Ox /EHsc tbench1.cpp

#include <cstdlib>
#include <ctime>
#include <string>
#include <iostream>
#include <fstream>
#include <sstream>

#include "Organism.h"

static bool is_file_exist(const char *fname) {
std::ifstream infile(fname);
return infile.good();
}

static Cell read_cell(const std::string& str) {
cell_coord_type x, y;
std::istringstream iss(str);
iss >> x; iss >> y;
return Cell(x, y);
}

// Reads a Life 1.06 text file
static cell_list_type read_cells_106(const char* fname) {
cell_list_type cells;
std::ifstream cellfile(fname);
if (!cellfile) {
std::cerr << "Error opening '" << fname << "'\n";
return cells;
}
for (std::string line; std::getline(cellfile, line); ) {
if (line.empty())   continue;    // ignore blank lines
if (line == '#') continue;    // ignore comment lines
}
return cells;
}

static void RunTest(const char* fname, int nticks)
{
Organism org;
{
org.insert_cells(cells);
size_t ncells = org.count();
std::cout << "cell count at start = " << ncells << "\n";
if (ncells != cells.size()) { std::cerr << "oops cell count mism
+atch" << "\n"; }
}
std::cerr << "run benchmark for " << nticks << " ticks\n";
time_t tstart = ::time(NULL);
for (int i = 1; i <= nticks; ++i ) {
org.tick();
}
time_t tend = ::time(NULL);
long taken  = static_cast<long>(::difftime(tend, tstart) + 0.5);
size_t ncells = org.count();
std::cout << "cell count at end = " << ncells << "\n";
std::cerr << "time taken " << taken << " secs\n";
}

int main(int argc, char* argv[])
{
if (argc != 3) {
std::cerr << "usage: tbench1 file nticks\n";
return 1;
}
const char* fname = argv;
if (!is_file_exist(fname)) {
std::cerr << "File '" << fname << "' does not exist\n";
return 1;
}
int nticks = ::atoi(argv);
if (nticks <= 0)
{
std::cerr << "'" << argv << "'" << " invalid nticks\n";
return 1;
}
RunTest(fname, nticks);
return 0;
}

Perl Solution

Update: The fastest version of the Perl Organism.pm code can now be found here: Mario improvements

Finally, here is my derived Perl version, Organism.pm:

```package Organism;

use strict;
use warnings;

sub count {
my \$self = shift;
return scalar keys %{ \$self->{Cells} };
}

sub is_alive {
my \$self = shift;
return 0 + exists \$self->{Cells}->{ join ':', @_ };
}

# Input a list of [ x, y ] coords
sub insert_cells {
my \$self = shift;
for my \$r (@_) { \$self->{Cells}->{ join ':', @{\$r} } = undef }
}

# Return sorted list of cells in the Organism.
# Used for verification and testing the state of the organism.
sub get_live_cells {
my \$self = shift;
sort { \$a-> <=> \$b-> || \$a-> <=> \$b-> }
map { [ split /:/, \$_ ] }
keys %{ \$self->{Cells} };
}

# Return the list of dead cells surrounding a cell
my ( \$self, \$x, \$y ) = @_;
( (join ':', \$x - 1, \$y - 1) x !\$self->is_alive(\$x - 1, \$y - 1),
(join ':', \$x - 1, \$y    ) x !\$self->is_alive(\$x - 1, \$y    ),
(join ':', \$x - 1, \$y + 1) x !\$self->is_alive(\$x - 1, \$y + 1),
(join ':', \$x    , \$y - 1) x !\$self->is_alive(\$x    , \$y - 1),
(join ':', \$x    , \$y + 1) x !\$self->is_alive(\$x    , \$y + 1),
(join ':', \$x + 1, \$y - 1) x !\$self->is_alive(\$x + 1, \$y - 1),
(join ':', \$x + 1, \$y    ) x !\$self->is_alive(\$x + 1, \$y    ),
(join ':', \$x + 1, \$y + 1) x !\$self->is_alive(\$x + 1, \$y + 1) );
}

sub get_num_live {
my ( \$self, \$x, \$y ) = @_;
\$self->is_alive( \$x - 1, \$y - 1 )
+ \$self->is_alive( \$x - 1, \$y     )
+ \$self->is_alive( \$x - 1, \$y + 1 )
+ \$self->is_alive( \$x    , \$y - 1 )
+ \$self->is_alive( \$x    , \$y + 1 )
+ \$self->is_alive( \$x + 1, \$y - 1 )
+ \$self->is_alive( \$x + 1, \$y     )
+ \$self->is_alive( \$x + 1, \$y + 1 );
}

sub tick {
my \$self = shift;
my %new_cells;
for my \$c (keys %{ \$self->{Cells} }) {
# Get the (up to 8) dead cells surrounding the cell
my @zcells = \$self->get_dead_cells( split /:/, \$c );

# Check the live cell
# Note: next line equivalent to nlive == 2 || nlive == 3
@zcells == 5 || @zcells == 6 and \$new_cells{\$c} = undef;

for my \$z (@zcells) {
\$self->get_num_live( split /:/, \$z ) == 3 and \$new_cells{\$z}
+= undef;
}
}
\$self->{Cells} = \%new_cells;
}

sub new {
my \$class = shift;
my %init_self = ( Cells => {} );
bless \%init_self, \$class;
}

1;

And here is the simple blinker benchmark program:

```# tbench1.pl - Simple benchmark of Organism class.
# Generate blinker test file with, for example:
#   perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp
# then run this script for two ticks, for example:
#   perl tbench1.pl x.tmp 2
use strict;
use warnings;
use Organism;

# XXX: To read Life 1.06 file, ignore leading line containing "#Life 1
+.06"
my \$fname = shift;
open( my \$fh, '<', \$fname ) or die "error: open '\$fname': \$!";
map { [ split ' ' ] } <\$fh>;
}

@ARGV == 2 or die "usage: \$0 file nticks\n";
my \$file   = shift;
my \$nticks = shift;
\$nticks =~ /^\d+\$/ or die "error: nticks (\$nticks) not a number";
my \$org = Organism->new();
{
\$org->insert_cells(@cells);
my \$ncells = \$org->count();
print "cell count at start = \$ncells\n";
\$ncells == scalar(@cells) or die "oops";
}
warn "run benchmark for \$nticks ticks\n";
my \$tstart = time;
for my \$i ( 1 .. \$nticks ) {
\$org->tick();
}
my \$tend = time;
my \$taken = \$tend - \$tstart;
my \$ncells = \$org->count();
print "cell count at end = \$ncells\n";
warn "time taken: \$taken secs\n";

Along with createblinker.pl to generate the test data:

```# Create blinker test data for load testing, e.g.:
#   perl createblinker.pl 500000 -100000 10 >x.tmp 2>y.tmp
# creates 500,000 * 3 blinker cells around y=10 starting at x=-100000
use strict;
use warnings;
my \$x = shift;
my \$y1 = shift;
my \$y2 = \$y1 + 1; my \$y3 = \$y1 + 2;
warn "\$x \$y1\n\$x \$y2\n\$x \$y3\n";
}
@ARGV == 3 or die "usage: \$0 n xstart ystart\n";
my \$n      = shift;
my \$xstart = shift;
my \$ystart = shift;
\$n =~ /^\d+\$/ or die "error: n not numeric";
\$xstart =~ /^[-]?\d+\$/ or die "error: xstart not numeric";
\$ystart =~ /^[-]?\d+\$/ or die "error: ystart not numeric";
my \$x = \$xstart;
my \$y = \$ystart;
my \$gap = 3;    # \$gap must be >= 3
for my \$i ( 0 .. \$n-1 ) {
my \$x1 = \$x + \$i; my \$x2 = \$x1 + 1; my \$x3 = \$x1 + 2;
print "\$x1 \$y\n\$x2 \$y\n\$x3 \$y\n";
\$x += \$gap;
}

And a simple unit test:

```# tgol2.t - Simple blinker test of Conway Game of Life Organism class
#   perl createblinker.pl 3 100 10 >xtest.tmp 2>ytest.tmp
# then run this test with:
#   prove -v tgol2.t

use strict;
use warnings;
use Organism;
use Test::More;
my \$ntests = ( \$nblinks + 1 ) * 3;
plan tests => \$ntests;

# XXX: To read Life 1.06 file, ignore leading line containing "#Life 1
+.06"
my \$fname = shift;
open( my \$fh, '<', \$fname ) or die "error: open '\$fname': \$!";
map { [ split ' ' ] } <\$fh>;
}

sub test_one {
my \$org = shift;
my \$desc = shift;
my \$expected = shift;
my \$nexpected = @{\$expected};
my \$ncells = \$org->count();
my @cells = \$org->get_live_cells();
cmp_ok( \$ncells, '==', \$nexpected, "\$desc cell count (\$ncells)" );
cmp_ok( scalar(@cells), '==', \$nexpected, "\$desc cell array count"
+);
is_deeply( \@cells, \$expected, "\$desc cell array" );
}

my @sblinker1 = sort { \$a-> <=> \$b-> || \$a-> <=> \$b-> } @b
my @sblinker2 = sort { \$a-> <=> \$b-> || \$a-> <=> \$b-> } @b

# Initialization
my \$org = Organism->new();

# Pattern should just blink back and forth
for my \$i ( 1 .. \$nblinks ) {
\$org->tick();
}

Update: Extra Test Program tgol.t

I originally forgot to mention that the following little test tgol.t should be run to catch any errors with negative x and y values before firing off any benchmarks:

```# tgol.t - Simple blinker test of Conway Game of Life Organism class
use strict;
use warnings;
use Organism;
use Test::More;
my \$ntests = ( \$nblinks + 1 ) * 3;
plan tests => \$ntests;

sub test_one {
my \$org = shift;         # Organism handle
my \$desc = shift;        # Test description
my \$expected = shift;    # Array ref of (sorted) expected cells
my \$nexpected = @{\$expected};
my \$ncells = \$org->count();
my @cells = \$org->get_live_cells();
cmp_ok( \$ncells, '==', \$nexpected, "\$desc cell count (\$ncells)" );
cmp_ok( scalar(@cells), '==', \$nexpected, "\$desc cell array count"
+);
is_deeply( \@cells, \$expected, "\$desc cell array" );
}

[ -101, -100 ], [ -100, -100 ], [  -99, -100 ],
[ -101,  100 ], [ -100,  100 ], [  -99,  100 ],
[   -1,    0 ], [    0,    0 ], [    1,    0 ],
[   99, -100 ], [  100, -100 ], [  101, -100 ],
[   99,  100 ], [  100,  100 ], [  101,  100 ],
);
[ -100,  -99 ], [ -100, -100 ], [ -100, -101 ],
[ -100,   99 ], [ -100,  100 ], [ -100,  101 ],
[    0,   -1 ], [    0,    0 ], [    0,    1 ],
[  100,  -99 ], [  100, -100 ], [  100, -101 ],
[  100,   99 ], [  100,  100 ], [  100,  101 ],
);
my @sblinker1 = sort { \$a-> <=> \$b-> || \$a-> <=> \$b-> } @b
my @sblinker2 = sort { \$a-> <=> \$b-> || \$a-> <=> \$b-> } @b

# Initialization
my \$org = Organism->new();

# Pattern should just blink back and forth
for my \$i ( 1 .. \$nblinks ) {
\$org->tick();
}

Further update: Another little lidka test program tgol3.t should be run to further test your solution.

CPAN Offerings

From a quick look at the CPAN namely:

Game::Life::Infinite::Board seemed to be the best, so, for cheap thrills, I wrote a benchmark of that module too (based on the earlier one for Organism.pm):
```# tbench1-infinite.pl - Simple benchmark.
# Uses Game::Life::Infinite::Board
use strict;
use warnings;
use Game::Life::Infinite::Board;

my \$fname = shift;
open( my \$fh, '<', \$fname ) or die "error: open '\$fname': \$!";
my @lines = <\$fh>;
return @lines;
}

@ARGV == 2 or die "usage: \$0 file nticks\n";
my \$file   = shift;
my \$nticks = shift;
\$nticks =~ /^\d+\$/ or die "error: nticks (\$nticks) not a number";
my \$org = Game::Life::Infinite::Board->new();
{
my \$ncells = \$org->statistics->{'liveCells'};
print "cell count at start = \$ncells\n";
\$ncells == scalar(@lines) or die "oops";
}
warn "run benchmark for \$nticks ticks\n";
my \$tstart = time;
for my \$i ( 1 .. \$nticks ) {
\$org->tick();
}
my \$tend = time;
my \$taken = \$tend - \$tstart;
my \$ncells = \$org->statistics->{'liveCells'};
print "cell count at end = \$ncells\n";
warn "time taken: \$taken secs\n";

Benchmark Results

Benchmark timings for running two ticks on my machine were:
Version375K cells750K cells1.5 million cells3 million cells
C++0 secs0 secs1 secs2 secs
Organism.pm (Mario improvements)13 secs26 secs52 secs108 secs
Organism.pm (Original above)35 secs70 secs141 secs284 secs
Game::Life::Infinite::Board37 secs96 secs273 secs905 secs
Update: the timings above were updated from perl v5.24.0 to perl v5.26.0 (which was very slightly faster).

As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

• C++ : 517,340K
• Organism.pm (Original above): 1,455,004K
• Organism.pm (Mario improvements): 1,596,368K
• Game::Life::Infinite::Board: 18,138,504K

Update: Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:
VersionLidka 30,000 ticks
C++36 secs
Organism.pm (Mario improvements)450 secs
Organism.pm (Original above)1635 secs
Game::Life::Infinite::Board640 secs
Note that Game::Life::Infinite::Board performed much better on this test, presumably due to the significantly lower memory requirements - the cell count in the Lidka benchmark remains under 2000 for all 30,000 ticks.

References

• John Horton Conway (wikipedia). Sadly, on 8 April 2020, Conway developed symptoms of COVID-19. On 11 April, he died in New Brunswick, New Jersey at the age of 82.

Updated: Added more references. Minor improvements to wording. Added Mario performance enhanced Organism.pm to the table of benchmark results. Added extra tgol.t test program in a new section. Another extra lidka test program tgol3.t. Updated tbench1.cpp to read a Life 1.06 text file. Added Lidka Performance 30,000 test to Benchmark Results section; also updated timings from perl v5.24.0 to perl v5.26.0.

Replies are listed 'Best First'.
Re: High Performance Game of Life
by tybalt89 (Prior) on Aug 11, 2017 at 22:36 UTC

And now for something completely different :)

This was written to see if I could use perl's logical string operations to do the calculations need for Life.

Of course, what I really wanted was a string oriented byte-by-byte add ( a +. to go along with the brand new |. and &. ) but I couldn't find one (sigh).

However, this is a way to use three bits in the byte as a sum, provided only one bit is added at a time.
Consider the 8 bits of a byte to be 0011SSSN where the 'S' bits are the sum and a new bit 'N' is added (actually or'd) as the bottom bit. (Why the two '1' bits? Because the character '0' is 00110000 as binary and '1' is 00110001 as binary, and we can do calculations such that the result is immediately printable. We don't, but we could :)

Now if I 'or' in a neighbor character it will only (potentially) change the 'N' bit. For each neighbor, I only have to add that bit to the sum (in SSS) and then clear the bottom bit for the next neighbor.

This turns out to be a simple tr/1357/2468/ translation.

(But tybalt89, you say, you only have a three bit counter and there are eight neighbors, you can't hold a count that high. Answer: I don't have to, since any count four or greater will result in a dead cell, I only have to count to four.)

After processing all eight neighbors, the center cell is then or'd in to give the state of the neighbor count along with the state of the center cell, which can the be translated to "alive or dead" for the next generation.

In this scheme, you need to specify the size of the grid in advance, and any live cells at the grid edge get killed to avoid wraparound'

So how does this do on the challenge - simple, it fails. (Oh good, I can avoid being hired :)

You did say you wanted "to see how others go about it".

Notes:

It's "live cell count" speed invariant, as opposed to techniques that keep only live cells.
I stopped performance testing because I ran into Heisenberg problems, i.e. output seriously affects speed.
It's not infinite size, or even semi-infinite, unless you've got a really, really, really big machine.
It should probably be moved to GPUs but a) I don't have a GPU to test on, and b) it probably already has been.

I usually run it without the override lines, and uncommented print statements. It autosizes to the terminal it is running in, (at least on ArchLinux and xterm), and of course increase the number of generations.

OK, I have to go cook supper, and I'm really tired of typing, so I'll submit at this point, and look for comments later...

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1197250
# Conway's life eyepopslikeamosquito

use strict;
use warnings;
use Time::HiRes 'time';

my \$w = qx(tput cols) + 0; # probably linux only :)
my \$h = qx(tput lines) - 7; # strange offset because of my REPL
# override for comparison with node results
\$w = int sqrt 3e6;
\$h = int 1 + 3e6 / \$w;

my \$board = '0' x \$w x \$h;
my \$mask = '0' x \$w . ('0' . "\xff" x (\$w - 2) . '0') x (\$h - 2) . '0'
+ x \$w;

# build starting state

sub place
{
my (\$x, \$y, \$pattern) = @_;
my \$offset = \$y * \$w + \$x;
for ( \$pattern =~ /.+/g )
{
substr \$board, \$offset, length \$_, \$_;
\$offset += \$w;
}
}

place( 4, 4, <<END =~ tr/-O/01/r ); # glider gun
------------------------O-----------
----------------------O-O-----------
------------OO------OO------------OO
-----------O---O----OO------------OO
OO--------O-----O---OO--------------
OO--------O---O-OO----O-O-----------
----------O-----O-------O-----------
-----------O---O--------------------
------------OO----------------------
END

place( \$w >> 1, \$h >> 1, <<END =~ tr/-O/01/r ); # switch engine
----------
-------O--
-----O-OO-
-----O-O--
-----O----
---O------
-O-O------
----------
END

my \$generations = 0;
my @offsets = ( 1, 2, \$w, \$w + 2, \$w * 2, \$w * 2 + 1, \$w * 2 + 2 );
my \$pad = '0' x (\$w + 1);

#print "\e[H\e[J"; # clear screen
#print "\e[H", \$board =~ tr/01/ O/r, "-" x \$w, "\$generations\n";

my \$starttime = time;

for (1..2) # loop over generations
{
\$generations++;

my \$sum = \$all =~ tr/1/2/r;                                # first o
+ffset
( \$sum |= substr \$all, \$_ ) =~ tr/1357/2468/ for @offsets; # other 7

\$board |= \$mask & \$sum;      # prevent overflow at edges
\$board =~ tr/1-9/000011100/; # determine alive or dead

#print "\e[H", \$board =~ tr/01/ O/r, "-" x \$w, "\$generations\n";
}

my \$time = time - \$starttime;
printf "\nw %d h %d cells %d time %.2f generations %d gen/sec %d" .
" nanoseconds/cell/gen %d\n",
\$w, \$h, \$w * \$h, \$time, \$generations, \$generations / \$time,
\$time / (\$w * \$h) / \$generations *1e9;
..oO i wonder if some engineering school would accept this project as a Phd thesis? Conway's Game of Life, with GUI visualization, optimized with assembly language routines. :-)

I'm not really a human, but I play one on earth. ..... an animated JAPH

Conway's Game of Life, with GUI visualization, optimized with assembly language routines. :-)

Been done, minus the assembly bits (but I'm sure patches would be welcome): golly.sourceforge.net

Re: High Performance Game of Life
by marioroy (Parson) on Aug 12, 2017 at 09:02 UTC

Unfortunately, method calling in Perl is expensive. The following change to Organism.pm will run two times faster simply by inlining is_alive.

Before:

```sub is_alive {
my \$self = shift;
return 0 + exists \$self->{Cells}->{ join ':', @_ };
}

# Return the list of dead cells surrounding a cell
my ( \$self, \$x, \$y ) = @_;
( (join ':', \$x - 1, \$y - 1) x !\$self->is_alive(\$x - 1, \$y - 1),
(join ':', \$x - 1, \$y    ) x !\$self->is_alive(\$x - 1, \$y    ),
(join ':', \$x - 1, \$y + 1) x !\$self->is_alive(\$x - 1, \$y + 1),
(join ':', \$x    , \$y - 1) x !\$self->is_alive(\$x    , \$y - 1),
(join ':', \$x    , \$y + 1) x !\$self->is_alive(\$x    , \$y + 1),
(join ':', \$x + 1, \$y - 1) x !\$self->is_alive(\$x + 1, \$y - 1),
(join ':', \$x + 1, \$y    ) x !\$self->is_alive(\$x + 1, \$y    ),
(join ':', \$x + 1, \$y + 1) x !\$self->is_alive(\$x + 1, \$y + 1) );
}

sub get_num_live {
my ( \$self, \$x, \$y ) = @_;
\$self->is_alive( \$x - 1, \$y - 1 )
+ \$self->is_alive( \$x - 1, \$y     )
+ \$self->is_alive( \$x - 1, \$y + 1 )
+ \$self->is_alive( \$x    , \$y - 1 )
+ \$self->is_alive( \$x    , \$y + 1 )
+ \$self->is_alive( \$x + 1, \$y - 1 )
+ \$self->is_alive( \$x + 1, \$y     )
+ \$self->is_alive( \$x + 1, \$y + 1 );
}

After:

```# Return the list of dead cells surrounding a cell
my ( \$cells, \$x0, \$y0 ) = ( shift->{Cells}, @_ );
my ( \$x1, \$x2, \$y1, \$y2 ) = ( \$x0 - 1, \$x0 + 1, \$y0 - 1, \$y0 + 1 );
( ( "\$x1:\$y1" ) x !( 0 + exists \$cells->{ "\$x1:\$y1" } ),
( "\$x1:\$y0" ) x !( 0 + exists \$cells->{ "\$x1:\$y0" } ),
( "\$x1:\$y2" ) x !( 0 + exists \$cells->{ "\$x1:\$y2" } ),
( "\$x0:\$y1" ) x !( 0 + exists \$cells->{ "\$x0:\$y1" } ),
( "\$x0:\$y2" ) x !( 0 + exists \$cells->{ "\$x0:\$y2" } ),
( "\$x2:\$y1" ) x !( 0 + exists \$cells->{ "\$x2:\$y1" } ),
( "\$x2:\$y0" ) x !( 0 + exists \$cells->{ "\$x2:\$y0" } ),
( "\$x2:\$y2" ) x !( 0 + exists \$cells->{ "\$x2:\$y2" } ) );
}

sub get_num_live {
my ( \$cells, \$x0, \$y0 ) = ( shift->{Cells}, @_ );
my ( \$x1, \$x2, \$y1, \$y2 ) = ( \$x0 - 1, \$x0 + 1, \$y0 - 1, \$y0 + 1 );
( 0 + exists \$cells->{ "\$x1:\$y1" } )
+ ( 0 + exists \$cells->{ "\$x1:\$y0" } )
+ ( 0 + exists \$cells->{ "\$x1:\$y2" } )
+ ( 0 + exists \$cells->{ "\$x0:\$y1" } )
+ ( 0 + exists \$cells->{ "\$x0:\$y2" } )
+ ( 0 + exists \$cells->{ "\$x2:\$y1" } )
+ ( 0 + exists \$cells->{ "\$x2:\$y0" } )
+ ( 0 + exists \$cells->{ "\$x2:\$y2" } );
}

Regards, Mario

Very instructive.

In C++, it was faster not to attempt anything like that with temporary variables (I tried), but instead to just leave the duplicated values and let the optimizer do it for you. Plus, of course, function call overhead can be eliminated via inline functions and macros.

In Perl, on the other hand, the compiler must run very fast, and doesn't attempt many of the optimizations of a C++ compiler run at high optimization levels.

On my laptop, the following shaves 4 seconds from one-time stringification per key.

```# Return the list of dead cells surrounding a cell
my ( \$cells, \$x0, \$y0 ) = ( shift->{Cells}, @_ );
my ( \$x1, \$x2, \$y1, \$y2 ) = ( \$x0 - 1, \$x0 + 1, \$y0 - 1, \$y0 + 1 );
my ( \$k1, \$k2, \$k3, \$k4, \$k5, \$k6, \$k7, \$k8 );

( ( \$k1 = "\$x1:\$y1" ) x !( 0 + exists \$cells->{ \$k1 } ),
( \$k2 = "\$x1:\$y0" ) x !( 0 + exists \$cells->{ \$k2 } ),
( \$k3 = "\$x1:\$y2" ) x !( 0 + exists \$cells->{ \$k3 } ),
( \$k4 = "\$x0:\$y1" ) x !( 0 + exists \$cells->{ \$k4 } ),
( \$k5 = "\$x0:\$y2" ) x !( 0 + exists \$cells->{ \$k5 } ),
( \$k6 = "\$x2:\$y1" ) x !( 0 + exists \$cells->{ \$k6 } ),
( \$k7 = "\$x2:\$y0" ) x !( 0 + exists \$cells->{ \$k7 } ),
( \$k8 = "\$x2:\$y2" ) x !( 0 + exists \$cells->{ \$k8 } ) );
}

To not allocate the key variables each time, another 2 seconds reduction is possible with the state feature.

```use feature 'state';

# Return the list of dead cells surrounding a cell
my ( \$cells, \$x0, \$y0 ) = ( shift->{Cells}, @_ );
my ( \$x1, \$x2, \$y1, \$y2 ) = ( \$x0 - 1, \$x0 + 1, \$y0 - 1, \$y0 + 1 );

state ( \$k1, \$k2, \$k3, \$k4, \$k5, \$k6, \$k7, \$k8 );

( ( \$k1 = "\$x1:\$y1" ) x !( 0 + exists \$cells->{ \$k1 } ),
( \$k2 = "\$x1:\$y0" ) x !( 0 + exists \$cells->{ \$k2 } ),
( \$k3 = "\$x1:\$y2" ) x !( 0 + exists \$cells->{ \$k3 } ),
( \$k4 = "\$x0:\$y1" ) x !( 0 + exists \$cells->{ \$k4 } ),
( \$k5 = "\$x0:\$y2" ) x !( 0 + exists \$cells->{ \$k5 } ),
( \$k6 = "\$x2:\$y1" ) x !( 0 + exists \$cells->{ \$k6 } ),
( \$k7 = "\$x2:\$y0" ) x !( 0 + exists \$cells->{ \$k7 } ),
( \$k8 = "\$x2:\$y2" ) x !( 0 + exists \$cells->{ \$k8 } ) );
}

Regards, Mario

Re: High Performance Game of Life
by zentara (Archbishop) on Aug 11, 2017 at 14:21 UTC
Hi, I think you need a few readmore tags. :-)

Awhile back, Discipulus asked me if I knew how to make a hexagon grid in Perl/Tk. It actually was pretty easy on a Tk::Canvas. The code below could be used for visualizing the game. The code as is, just reports cell position with mouse movement, but it could easily be setup to store each hexagon as it's own hash element, then set up a timer to cycle thru your Conway algorithms and change colors on the various grid hex elements. The code below dosn't really show it, but you would also want to store each cells data in canvas tags. You could store JSON data in a tag. Anyways... for the visualization. :-) The Tk::Canvas is extremely versatile when the tags are juggled correctly.

```#!/usr/bin/perl
use warnings;
use strict;
use Tk;

# simplest hex example where the cell height = sqrt(3) cell width
# and all cells vertically oriented like a honeycomb

# W = 2 * r
# s = 1.5 * r
# H = 2 * r * sin(60 degrees) = sqrt(3) * r

# therefore
# r = W/2 and we can compute our polygon's points
#

# set number of cells in x and y direction
my ( \$num_cells_x , \$num_cells_y) = (50,50);

# set rectangular cell and width and compute height
my \$cwidth = 40;
my \$cheight = sqrt(3) * \$cwidth;

# compute canvas size required
my (\$canvasWidth, \$canvasHeight) = (\$num_cells_x *  \$cwidth,
\$num_cells_y *  \$cheight);

my \$mw = MainWindow->new();
\$mw->geometry('500x500+300+300');

my \$sc = \$mw->Scrolled('Canvas', -bg => 'black',
-width => \$canvasWidth,
-height => \$canvasHeight,
-scrollbars => 'osoe',
-scrollregion => [ 0, 0, \$canvasWidth, \$canvasHeight ],
)->pack;

my \$canvas =\$sc->Subwidget("canvas");

my (\$x, \$y, \$r , \$s , \$h, \$w, \$diff, \$row, \$col );
\$r = \$cwidth/2;
\$w = \$cwidth;
\$h = sqrt(3) * \$r;
\$s = 1.5 * \$r;
\$diff = \$w - \$s;
\$row = 0;
\$col = 0;

# \$x and \$y are center of mass locations
for (\$y = 0; \$y < \$canvasHeight; \$y+= \$h/2){

for (\$x = 0; \$x < \$canvasWidth; \$x+= (2*\$r + \$s - \$diff) ){

my \$shift = 0;
my \$color; # toggles row colors and spacings
if (\$row % 2){ \$color = '#FFAAAA';
\$shift = \$s ;
}

else{ \$color = '#AAAAFF'}

#print "\$color\n";

my \$x0 = \$x - \$r - \$shift;
my \$y0 = \$y;
my \$x1 = \$x0 + \$diff;
my \$y1 = \$y0 - \$h/2;
my \$x2 = \$x0 + \$s;
my \$y2 = \$y0 - \$h/2;
my \$x3 = \$x0 + 2*\$r;
my \$y3 = \$y;
my \$x4 = \$x0 + \$s;
my \$y4 = \$y0 + \$h/2;
my \$x5 = \$x1;
my \$y5 = \$y0 + \$h/2;
my \$x6 = \$x0; # close up to starting point
my \$y6 = \$y0;

# account for \$shift affecting x position
# xpos != x
my \$xpos = \$x0 + \$r;

my \$hexcell = \$canvas->createPolygon (\$x0, \$y0, \$x1, \$y1,
\$x2, \$y2,\$x3, \$y3,\$x4, \$y4,\$x5, \$y5, \$x6, \$y6,
+
-fill => \$color,
-activefill => '#CCFFCC',
-tags =>['hexcell',"row.\$row","col.\$col", "po
+sx.\$xpos", "posy.\$y" ],
-width => 1,
-outline => 'black',
);
\$col++;
}
\$row++;
\$col = 0; # reset column
}

\$sc->bind('hexcell', '<Enter>', \&enter );

\$sc->bind("hexcell", "<Leave>", \&leave );

\$sc->bind("hexcell", "<1>", \&left_click );

MainLoop;

sub left_click {
my (\$canv) = @_;
my \$id = \$canv->find('withtag', 'current');
my @tags = \$canv->gettags(\$id);
print "@tags\n";
\$canv->itemconfigure(\$id, -fill=>'#44FF44');
}

sub enter {
my (\$canv) = @_;
my \$id = \$canv->find('withtag', 'current');
my @tags = \$canv->gettags(\$id);
print "@tags\n";
}

sub leave{
my (\$canv) = @_;
print "leave\n";
}

I'm not really a human, but I play one on earth. ..... an animated JAPH
Re: High Performance Game of Life
by eyepopslikeamosquito (Bishop) on Aug 12, 2017 at 07:15 UTC

In Fastest way to lookup a point in a set we concluded, in isolation, that hash lookups were faster using split/join rather than pack/unpack.

With nothing to lose though, I tried changing split /:/, \$z to unpack 'ii', \$z and

```join ':', \$x - 1, \$y - 1
join ':', @_
to:
```pack 'ii', \$x - 1, \$y - 1
pack 'ii', @_
... and almost fell off my chair when the run time for three million cells dropped from 287 seconds way down to 204 seconds!! What gives?

As shown below, there is no difference in the number of op codes:

```> perl -MO=Terse -e "split /:/, \$z"
LISTOP (0x2bf88f8) leave 
OP (0x25b59a0) enter
COP (0x2bf8940) nextstate
LISTOP (0x25b59d8) split 
PMOP (0x25b5a60) pushre
UNOP (0x25b5a20) null 
SVOP (0x25b5938) const  IV (0x25b0d80) 0
> perl -MO=Terse -e "unpack 'ii', \$z"
LISTOP (0x2c387f8) leave 
OP (0x645908) enter
COP (0x645940) nextstate
LISTOP (0x6459d8) unpack
OP (0x6459a0) null 
SVOP (0x645aa0) const  PV (0x63fd40) "ii"
UNOP (0x645a20) null 
PADOP (0x645a60) gvsv  GV (0x63fe00) *z
> perl -MO=Terse -e "join ':', @_"
LISTOP (0x2a9d798) leave 
OP (0x24f5938) enter
COP (0x24f5970) nextstate
LISTOP (0x24f5a08) join 
OP (0x24f59d0) pushmark
SVOP (0x24f5ad0) const  PV (0x24f0960) ":"
UNOP (0x24f5a50) rv2av 
PADOP (0x24f5a90) gv  GV (0xe9b2c8) *_
> perl -MO=Terse -e "pack 'ii', @_"
LISTOP (0x2c4e318) leave 
OP (0x645938) enter
COP (0x645970) nextstate
LISTOP (0x645a08) pack 
OP (0x6459d0) pushmark
SVOP (0x645ad0) const  PV (0x640a80) "ii"
UNOP (0x645a50) rv2av 
PADOP (0x645a90) gv  GV (0xffb2c8) *_

There didn't appear to be any significant difference in memory consumption either.

Anyone got any ideas? The slowdown may be caused by split using a /:/ regex - and regexes are slow. Note that in Fastest way to lookup a point in a set, we were measuring lookups in isolation and lookups use join, not split. How to investigate further? Devel::NYTProf?

Update:: From running:

```for my \$r ( [-123456789, 987654321], [1,2] ) {
my \$pp = pack 'ii', @{\$r};
my \$jj = join ':', @{\$r};
my \$pplen = length \$pp;
my \$jjlen = length \$jj;
print "\$r->:\$r-> packlen=\$pplen joinlen=\$jjlen\n";
my (\$xpp, \$ypp) = unpack 'ii', \$pp;
my (\$xjj, \$yjj) = split /:/, \$jj;
\$xpp == \$r-> or die;
\$ypp == \$r-> or die;
\$xjj == \$r-> or die;
\$yjj == \$r-> or die;
}
we see:
```-123456789:987654321 packlen=8 joinlen=20
1:2 packlen=8 joinlen=3
That is, pack/unpack always has a hash key length of 8 bytes, while with split/join the key length varies, depending on the size of the x and y coordinates.

Update: As discovered by marioroy, 'i2' is faster than 'ii' in pack and unpack.

Re: High Performance Game of Life
by AppleFritter (Vicar) on Aug 12, 2017 at 18:35 UTC

One of the fastest general Life algorithms I know of is vlife, used by Adam P. Goucher in apgmera. It supports arbitrary (outer-totalistic) rules; the generic bits are written in C++ (and live in includes/vlife.h), while the rule-specific parts are in assembly, using SSE2/AVX1/AVX2 (whatever is available), generated by a Python script. It's a very clever algorithm using very clever data structures that probably achieves close to the maximum of what you can eke out of a modern CPU, speed-wise, at least without resorting to even newer instruction sets.

It includes a benchmark in which a well-known methuselah is run for 30k generations (about as long as it takes to stabilize). make vlifetest and run the resulting vlifetest binary to run it -- it would be interesting to compare this to your C++ and Perl implementations, as well as CPAN's offerings. On my machine (which is a couple of years old and only supports AVX1), the average time across the benchmark's 50 iterations is 135.72 ms:

```\$ ./vlifetest
VersaTile size: 336
Instruction set AVX1 supported.
Population count: 1623
Tiles processed: 596194
[...]
Lidka + 30k in 135.72 ms.
\$
```

EDIT: on the same machine, tbench1.cpp takes about a minute to run Lidka to completion:

```\$ ./tbench1 lidka_106.lif 30000
cell count at start = 13
run benchmark for 30000 ticks
cell count at end = 1623
time taken 60 secs
\$
```

The benchmark of the Perl implementation is still running.

EDIT 2: Perl ended up being faster than I thought, but at more than 40 minutes, it's still pretty slow compared to the C++ version, much less vlife:

```\$ perl tbench1.pl lidka_106.lif 30000
cell count at start = 13
run benchmark for 30000 ticks
cell count at end = 1623
time taken: 2441 secs
\$
```

EDIT 3: apgmera has since seen a new major release (4.x), and the algorithmic guts now live in a separate repo, lifelib. The author informs me that it's actually about 8% faster compared to the vlife code in 3.x, too.

Footnote:

1. HashLife is a bit of a special case, obviously.
Re: High Performance Game of Life
by tybalt89 (Prior) on Nov 14, 2019 at 19:06 UTC

... I'm interested to see how others go about it.
Here's one. It's not fast, but it makes up for it with cuteness :)

```#!/usr/bin/perl

print \$_ = <<'' =~ tr/./ /r; # test case, dots only for layout
...........
.......#...
........#..
.###..###..
...........
...........

my \$gap = /..\n/ ? qr/.{\$-}/s : die "too small";
my \$neighborhood = qr/(?<=(...) \$gap (.)) . (?=(.) \$gap (...))/x;

for my \$generation ( 2 .. 5 )
{
s/\$neighborhood/ substr "  \$&#     ", "\$1\$2\$3\$4" =~ tr|#||, 1 /ge;
print;
}

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://1197250]
Front-paged by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2021-10-22 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My first memorable Perl project was:

Results (86 votes). Check out past polls.

Notices?