Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^15: High Performance Game of Life (updated)

by marioroy (Prior)
on Aug 15, 2017 at 02:12 UTC ( [id://1197393]=note: print w/replies, xml ) Need Help??


in reply to Re^14: High Performance Game of Life (updated)
in thread High Performance Game of Life

Update: Found the reason why cperl is failing with 64-bit, described below.

Hi tybalt89,

Perl is passing with 32 and 64 bits. Regarding cperl, it is passing 32 bits ($half = 16), but not 64 bits ($half = 32).

$ perl createblinker.pl 5000 -9000 100 >x.tmp 2>y.tmp

bin/perl

$ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 15000 run benchmark for 2 ticks cell count at end = 15000 time taken: 0 secs

bin/cperl

$ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 15000 run benchmark for 2 ticks cell count at end = 6753 <-- fails on 64-bit hw ($half = 32) time taken: 0 secs

The reason cperl is failing on 64-bit hw ($half = 32) is due to numbers converting to exponential notation.

# perl createblinker.pl 5 -9 100 >x.tmp 2>y.tmp # print "@zcells\n"; 9223372039002259557 -9.22337203900226e+18 -9.22337203900226e+18, ...

Running $half = 30 resolves the issue. If you want, $half may be set programmatically.

# use 30-bits on 64-bit hw for cperl compatibility my $half = ( ( log(~0 + 1) / log(2) ) >= 64 ) ? 30 : 16;

Regards, Mario

Replies are listed 'Best First'.
Re^16: High Performance Game of Life (updated - results)
by marioroy (Prior) on Aug 15, 2017 at 05:30 UTC

    Update 1: Added results for the original version and initial improvements.
    Update 2: Added results from tbench1-infinite.pl using Game::Life::Infinite::Board.
    Update 3: Added results for C++, Organism.h and tbench1.cpp.

    In this thread, we've learned that pack i2 is faster than pack ii. Furthermor, functions having inner loops benefit greatly by inlining critical paths. In essence this is what we've done. The pack i2 solution fully optimized is found here, by tybalt89. The mapping of two numbers into one can be found here. Similarly, a shorter implementation by tybalt89 is found here. The fastest time was noted for each run with nothing else running in the background.

    The maximum key length for $c, obtained separately, is 7, 12, and 18 for pack i2, the mapping of two numbers, and the shorter solution by tybalt89, respectively. The fix for the latter solution, when running cperl on 64-bit hardware, is using 30 bits instead of 32, described here.

    The x and y tmp files were made using eyepopslikeamosquito's createblinker.pl script, found at the top of this thread.

    $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

    Benchmark results were captured on a 2.6 GHz laptop (i7 Haswell). I apologize for not having something older to benchmark on. If anything, the OP's machine is faster than mine. ;-)

    Results:

    c++ 138 MB 4 secs mem size v5.26.0 v5.24.2 v5.22.4 infinite 7,548 MB 122 secs 119 secs 133 secs original 704 MB 167 secs 168 secs 180 secs improvements 744 MB 57 secs 57 secs 63 secs optimized i2 1,543 MB 38 secs 39 secs 40 secs 2 nums into 1 1,510 MB 39 secs 40 secs 42 secs shorter impl 1,661 MB 40 secs 42 secs 44 secs

    cperl:

    $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 116 secs - infinite board time taken: 163 secs - original time taken: 54 secs - improvements time taken: 36 secs - optimized pack i2 time taken: 37 secs - two numbers into one time taken: 37 secs - shorter implementation

    Pack returns unreadable data, but is fast. Readable keys may be preferred for storing into a DB. Stringification "$x:$y" is one way. Unfortunately, that requires split to extract the values and a text field versus numeric if storing into a DB. Bit manipulation is another way.

    Running Game::Life::Infinite::Board consumes lots of memory. If possible, check for 8 ~ 10 gigabytes of available memory to minimize the OS from swapping. It also takes ~ 10 seconds during global cleanup while exiting.

    eyepopslikeamosquito, infinite isn't running ~ 2x slower as reported here for 1.5 million cells. My laptop has 1600 MHz RAM and verified available memory before running.

    Regards, Mario

      New entry.

      Hey, it passes all the tests :)

      hehehe

      package Organism; # based on http://perlmonks.org/?node_id=1197284 use strict; use warnings; sub count { return shift->{config}[0] =~ tr/1//; } # Input a list of [ x, y ] coords sub insert_cells { my $extra = 3; my $self = shift; my $xl = my $xh = $_[0][0]; # find cell limits my $yl = my $yh = $_[0][1]; for (@_) { my ($x, $y) = @$_; $xl > $x and $xl = $x; $xh < $x and $xh = $x; $yl > $y and $yl = $y; $yh < $y and $yh = $y; } my $xoffset = $xl - $extra; # get sizes and insert live cells my $w = $xh - $xl + 2 * $extra; my $yoffset = $yl - $extra; my $h = $yh - $yl + 2 * $extra; my $grid = '0' x $w x $h; for (@_) { my ($x, $y) = @$_; substr $grid, $x - $xoffset + ($y - $yoffset) * $w, 1, '1'; } $self->{config} = [ $grid, $w, $h, $xoffset, $yoffset ]; } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { my $self = shift; my ( $grid, $w, $h, $xoffset, $yoffset ) = @{ $self->{config} }; my @cells; push @cells, [ $-[0] % $w + $xoffset, int( $-[0] / $w ) + $yoffset ] while $grid =~ /1/g; sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells; } sub tick { my $self = shift; my ( $grid, $w, $h ) = @{ $self->{config} }; my $all = '0' x ($w + 1) . $grid; my $sum = $all =~ tr/1/2/r; ( $sum |= substr $all, $_ ) =~ tr/1357/2468/ for 1, 2, $w, $w + 2, $w * 2, $w * 2 + 1, $w * 2 + 2; # other 7 neighb +ors $grid = substr $grid | $sum, 0, $w * $h; $self->{config}[0] = $grid =~ tr/1-9/000011100/r; # dead or alive } sub new { my $class = shift; my %init_self = ( ); bless \%init_self, $class; } 1;

        Hi tybalt89,

        Wow! Your new entry runs faster than C++. Also, memory consumption is less than 500 MB ;-)

        $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp $ g++ -o tbench1 -std=c++11 -Wall -O3 tbench1.cpp $ time ./tbench1 x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken 4 secs real 0m5.240s mem 139 MB user 0m5.149s sys 0m0.085s $ time /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 1 secs real 0m3.482s mem 492 MB user 0m3.242s sys 0m0.233s

        Micro-optimization may be a subjective matter. At this level, one may want to for 2%.

        I've replaced 3 multiplications ( $w * 2 ) with ( $w << 1 ).

        ( $sum |= substr $all, $_ ) =~ tr/1357/2468/ for 1, 2, $w, $w + 2, ($w << 1), ($w << 1) + 1, ($w << 1) + 2; # other 7 + neighbors
        $ time /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 1 secs real 0m3.420s mem 492 MB user 0m3.203s sys 0m0.205s

        Regards, Mario

      Re having an old PC - and rubbing salt into tybalt89's wounds - please note that my (Haswell 4770K-powered) home PC is unchanged from when I mentioned it over three years ago in The 10**21 Problem (Part 2) and so is getting rather old and creaky now. Definitely due for an upgrade. :)

      Though my ancient PC does have 32 GB of memory (and so there is no swapping), given my very slow benchmark results with Game::Life::Infinite::Board, I'm suspicious that my doddering, ancient memory is starting to fail, due to being so old. This may be contributing to the significantly different results I am seeing.

      When running:

      perl tbench1.pl x.tmp 2
      with my originally shortened Organism.pm:
      package Organism; use strict; # use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ pack 'i2', @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ unpack 'i2', $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = unpack 'i2', $c; ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); my @zcells = ( ($k1 = pack 'i2', $x1, $y1) x !(exists $cells->{$k1}), ($k2 = pack 'i2', $x1, $y0) x !(exists $cells->{$k2}), ($k3 = pack 'i2', $x1, $y2) x !(exists $cells->{$k3}), ($k4 = pack 'i2', $x0, $y1) x !(exists $cells->{$k4}), ($k5 = pack 'i2', $x0, $y2) x !(exists $cells->{$k5}), ($k6 = pack 'i2', $x2, $y1) x !(exists $cells->{$k6}), ($k7 = pack 'i2', $x2, $y0) x !(exists $cells->{$k7}), ($k8 = pack 'i2', $x2, $y2) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; # Check the dead cells for my $z (@zcells) { ( $x0, $y0 ) = unpack 'i2', $z; ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 + ); # Get num live ( ( exists $cells->{ pack 'i2', $x1, $y1 } ) + ( exists $cells->{ pack 'i2', $x1, $y0 } ) + ( exists $cells->{ pack 'i2', $x1, $y2 } ) + ( exists $cells->{ pack 'i2', $x0, $y1 } ) + ( exists $cells->{ pack 'i2', $x0, $y2 } ) + ( exists $cells->{ pack 'i2', $x2, $y1 } ) + ( exists $cells->{ pack 'i2', $x2, $y0 } ) + ( exists $cells->{ pack 'i2', $x2, $y2 } ) ) == 3 and $new_cells{$z} = undef; } } $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;
      I get 62 seconds, while after making the excellent improvements suggested by tybalt89 I get 80 seconds.
      package Organism; use strict; # use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ pack 'i2', @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ unpack 'i2', $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = unpack 'i2', $c; ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); $dead_cells{$_}++ for my @zcells = ( ($k1 = pack 'i2', $x1, $y1) x !(exists $cells->{$k1}), ($k2 = pack 'i2', $x1, $y0) x !(exists $cells->{$k2}), ($k3 = pack 'i2', $x1, $y2) x !(exists $cells->{$k3}), ($k4 = pack 'i2', $x0, $y1) x !(exists $cells->{$k4}), ($k5 = pack 'i2', $x0, $y2) x !(exists $cells->{$k5}), ($k6 = pack 'i2', $x2, $y1) x !(exists $cells->{$k6}), ($k7 = pack 'i2', $x2, $y0) x !(exists $cells->{$k7}), ($k8 = pack 'i2', $x2, $y2) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; } $dead_cells{$_} == 3 and $new_cells{$_} = undef for keys %dead_cell +s; $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;

      Update: These tests were run with perl v5.24.0:

      This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x +64-multi-thread
      Need to upgrade that ancient Perl. :)

      Update: After upgrading to perl v5.26.0, both are faster: (62 secs, 80 secs) improved to (53 secs, 71 secs).

        Hi eyepopslikeamosquito,

        Oh I wished I had something older to run on. Unfortunately, my laptop is the slowest machine I have. The Haswell/Crystalwell chip has 128MB of eDRAM. It's purpose is for the GPU. But, the 4 cores has access to it as well when the GPU is idle, which is the case while benchmarking Perl. The late 2013 i7-4960HQ CPU runs at 2.6 GHz with turbo-boost at 3.8 GHz. I've forgotten that it could run that high on one core with nothing else running.

        I have some news to share. It's possible for tybalt89's excellent optimization to run faster. The extra optimization shaves 6 ~ 7 seconds. Memory consumption reduced as well. I applied the same optimization to the two bit implementations, mine and tybalt89's shorter implementation. For the latter, also applied a change mentioned earlier to not fail with cperl.

        # use 30-bits on 64-bit hw for cperl compatibility my $half = ( ( log(~0 + 1) / log(2) ) >= 64 ) ? 30 : 16;

        Before:

        $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp $ perl tbench1.pl x.tmp 2 mem size v5.26.0 v5.24.2 v5.22.4 cperl infinite 7,548 MB 122 secs 119 secs 133 secs 116 secs original 704 MB 167 secs 168 secs 180 secs 163 secs improvements 744 MB 57 secs 57 secs 63 secs 54 secs optimized i2 1,543 MB 38 secs 39 secs 40 secs 36 secs 2 nums into 1 1,510 MB 39 secs 40 secs 42 secs 37 secs shorter impl 1,661 MB 40 secs 42 secs 44 secs 37 secs

        After:

        mem size v5.26.0 v5.24.2 v5.22.4 cperl optimized i2 1,326 MB 31 secs 32 secs 33 secs 30 secs 2 nums into 1 1,292 MB 33 secs 34 secs 35 secs 31 secs shorter impl 1,443 MB 35 secs 36 secs 37 secs 31 secs

        The optimization was made to "Check dead cells".

        package Organism; use strict; # use warnings; sub count { return scalar keys %{ shift->{Cells} }; } # Input a list of [ x, y ] coords sub insert_cells { my $cells = shift->{Cells}; for my $r (@_) { $cells->{ pack 'i2', @{$r} } = undef } } # Return sorted list of cells in the Organism. # Used for verification and testing the state of the organism. sub get_live_cells { sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ unpack 'i2', $_ ] } keys %{ shift->{Cells} }; } sub tick { my $self = shift; my $cells = $self->{Cells}; my ( $k1, $k2, $k3, $k4, $k5, $k6, $k7, $k8, $x0, $x1, $x2, $y0, $y1, $y2, %new_cells, %dead_cells ); for my $c (keys %{ $cells }) { # Get the (up to 8) dead cells surrounding the cell ( $x0, $y0 ) = unpack 'i2', $c; ( $x1, $x2, $y1, $y2 ) = ( $x0 - 1, $x0 + 1, $y0 - 1, $y0 + 1 ); my @zcells = ( ($k1 = pack 'i2', $x1, $y1) x !(exists $cells->{$k1}), ($k2 = pack 'i2', $x1, $y0) x !(exists $cells->{$k2}), ($k3 = pack 'i2', $x1, $y2) x !(exists $cells->{$k3}), ($k4 = pack 'i2', $x0, $y1) x !(exists $cells->{$k4}), ($k5 = pack 'i2', $x0, $y2) x !(exists $cells->{$k5}), ($k6 = pack 'i2', $x2, $y1) x !(exists $cells->{$k6}), ($k7 = pack 'i2', $x2, $y0) x !(exists $cells->{$k7}), ($k8 = pack 'i2', $x2, $y2) x !(exists $cells->{$k8}) ); # Check the live cell # Note: next line equivalent to nlive == 2 || nlive == 3 @zcells == 5 || @zcells == 6 and $new_cells{$c} = undef; # Check the dead cells for my $z ( @zcells ) { $new_cells{$z} = undef if ++$dead_cells{$z} == 3; } } $self->{Cells} = \%new_cells; } sub new { my $class = shift; my %init_self = ( Cells => {} ); bless \%init_self, $class; } 1;

        Regards, Mario

Re^16: High Performance Game of Life (updated)
by tybalt89 (Monsignor) on Aug 15, 2017 at 02:23 UTC

    To get decent times for testing purposes with the smaller number of blinkers, increase the number of ticks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1197393]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-18 16:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found