http://www.perlmonks.org?node_id=1197390


in reply to Re^12: High Performance Game of Life (updated)
in thread High Performance Game of Life

Hi tybalt89. The following was captured from a 64-bit laptop with $half = 32. Unfortunately, cperl is failing for some reason. I also tried $half = 16, same thing. The cell count at end isn't correct. To be sure, I pulled down the latest maint release and tried again.

$ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

bin/perl

$ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs $ /opt/perl-5.26.0/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 42 secs

bin/cperl

$ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs $ /opt/cperl-5.26.1c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 675003 <-- incorrect time taken: 34 secs

Regarding cperl, I built it using the following configure options. The source for cperl-maint-5.24c and cperl-maint-5.26c can be found on Github.

./Configure -Dprefix=/opt/cperl-5.24.3c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2 ./Configure -Dprefix=/opt/cperl-5.26.1c -sder -Dusethreads -Dusecperl +-Accflags=-msse4.2

Regards, Mario

Replies are listed 'Best First'.
Re^14: High Performance Game of Life (updated)
by tybalt89 (Monsignor) on Aug 15, 2017 at 01:59 UTC

    When testing with the 32 bit version, the range of coordinates should be restricted to what a 16 bit number can hold, from roughly -32000 to 32000 (to be safe :)

    The 32 bit version does pass the tests, and runs OK on my system with a much smaller createblinker.pl range.

    That -900000 is way out of range.

    As for problems with the 64 bit version, sorry I can't help :(

      Update: Found the reason why cperl is failing with 64-bit, described below.

      Hi tybalt89,

      Perl is passing with 32 and 64 bits. Regarding cperl, it is passing 32 bits ($half = 16), but not 64 bits ($half = 32).

      $ perl createblinker.pl 5000 -9000 100 >x.tmp 2>y.tmp

      bin/perl

      $ /opt/perl-5.24.2/bin/perl -I. tbench1.pl x.tmp 2 cell count at start = 15000 run benchmark for 2 ticks cell count at end = 15000 time taken: 0 secs

      bin/cperl

      $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 15000 run benchmark for 2 ticks cell count at end = 6753 <-- fails on 64-bit hw ($half = 32) time taken: 0 secs

      The reason cperl is failing on 64-bit hw ($half = 32) is due to numbers converting to exponential notation.

      # perl createblinker.pl 5 -9 100 >x.tmp 2>y.tmp # print "@zcells\n"; 9223372039002259557 -9.22337203900226e+18 -9.22337203900226e+18, ...

      Running $half = 30 resolves the issue. If you want, $half may be set programmatically.

      # use 30-bits on 64-bit hw for cperl compatibility my $half = ( ( log(~0 + 1) / log(2) ) >= 64 ) ? 30 : 16;

      Regards, Mario

        Update 1: Added results for the original version and initial improvements.
        Update 2: Added results from tbench1-infinite.pl using Game::Life::Infinite::Board.
        Update 3: Added results for C++, Organism.h and tbench1.cpp.

        In this thread, we've learned that pack i2 is faster than pack ii. Furthermor, functions having inner loops benefit greatly by inlining critical paths. In essence this is what we've done. The pack i2 solution fully optimized is found here, by tybalt89. The mapping of two numbers into one can be found here. Similarly, a shorter implementation by tybalt89 is found here. The fastest time was noted for each run with nothing else running in the background.

        The maximum key length for $c, obtained separately, is 7, 12, and 18 for pack i2, the mapping of two numbers, and the shorter solution by tybalt89, respectively. The fix for the latter solution, when running cperl on 64-bit hardware, is using 30 bits instead of 32, described here.

        The x and y tmp files were made using eyepopslikeamosquito's createblinker.pl script, found at the top of this thread.

        $ perl createblinker.pl 500000 -900000 100 >x.tmp 2>y.tmp

        Benchmark results were captured on a 2.6 GHz laptop (i7 Haswell). I apologize for not having something older to benchmark on. If anything, the OP's machine is faster than mine. ;-)

        Results:

        c++ 138 MB 4 secs mem size v5.26.0 v5.24.2 v5.22.4 infinite 7,548 MB 122 secs 119 secs 133 secs original 704 MB 167 secs 168 secs 180 secs improvements 744 MB 57 secs 57 secs 63 secs optimized i2 1,543 MB 38 secs 39 secs 40 secs 2 nums into 1 1,510 MB 39 secs 40 secs 42 secs shorter impl 1,661 MB 40 secs 42 secs 44 secs

        cperl:

        $ /opt/cperl-5.24.3c/bin/cperl -I. tbench1.pl x.tmp 2 cell count at start = 1500000 run benchmark for 2 ticks cell count at end = 1500000 time taken: 116 secs - infinite board time taken: 163 secs - original time taken: 54 secs - improvements time taken: 36 secs - optimized pack i2 time taken: 37 secs - two numbers into one time taken: 37 secs - shorter implementation

        Pack returns unreadable data, but is fast. Readable keys may be preferred for storing into a DB. Stringification "$x:$y" is one way. Unfortunately, that requires split to extract the values and a text field versus numeric if storing into a DB. Bit manipulation is another way.

        Running Game::Life::Infinite::Board consumes lots of memory. If possible, check for 8 ~ 10 gigabytes of available memory to minimize the OS from swapping. It also takes ~ 10 seconds during global cleanup while exiting.

        eyepopslikeamosquito, infinite isn't running ~ 2x slower as reported here for 1.5 million cells. My laptop has 1600 MHz RAM and verified available memory before running.

        Regards, Mario

        To get decent times for testing purposes with the smaller number of blinkers, increase the number of ticks.