Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

The 10**21 Problem (Part 4)

by eyepopslikeamosquito (Canon)
on May 19, 2014 at 11:48 UTC ( #1086650=perlmeditation: print w/ replies, xml ) Need Help??

In part three of this seemingly never-ending series on high performance computing we continued what may become the longest whittle in Perl Monks history by further reducing the running time of our magic formula search program, from 728 years down to just 286.

The latest round of improvements were achieved not by a major breakthrough, rather by applying a succession of micro-optimizations. To achieve the required massive speed boost, it was becoming increasingly clear that we needed to find a new breakthrough idea.

Constraint Satisfaction Problems

Constraint satisfaction problems (CSPs) are mathematical problems defined as a set of objects whose state must satisfy a number of constraints or limitations. CSPs are the subject of intense research in both artificial intelligence and operations research, since the regularity in their formulation provides a common basis to analyze and solve problems of many unrelated families. CSPs often exhibit high complexity, requiring a combination of heuristics and combinatorial search methods to be solved in a reasonable time.

-- Constraint satisfaction problem (wikipedia)

So, while the talk of this being a “10**21 problem” is an interesting one, and the latest installment (#3) about whacking the TLB of the processor is even more so, it seems like an incredible amount of work to unleash on a problem that has a trivial algorithmic solution ... This is on the one hand very interesting and computer-sciency, as I have already said, but the choice of example-problem puzzles me.

-- sundialsvc4

Roman numerals were the challenge at hand. The method of the example is common to constraint searches beyond the example. Constraint searches exist in drug research and many other fields ... Pick any field which has a large search space for just the right combination of properties in an as yet undiscovered item. Write a program which tries and makes a preliminary fitness determination for each possibility. Have that program spit out a short list of candidates for further investment of testing and development ... In this specific case, the fitness is a maximum length, a handful of inputs, and a handful of outputs that map correctly to those inputs ... The point of an example is that it is a concrete thing that is completed and shown rather than an abstract idea.

-- mr_mischief beautifully explains the point to sundialsvc4

mr_mischief was right on the money, sundialsvc4 on the wrong track.

This was indeed a classic constraint satisfaction problem ... which I felt woefully ill-equipped to solve. Lacking the mathematical ability of an ambrus, the only strategy I could think of was to desperately search for a hack, any hack, that would allow me to abandon a potential solution as early as possible, as soon I was certain it could not be successfully completed.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a lower-level procedure or heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem, especially with incomplete or imperfect information or limited computation capacity.

-- Metaheuristic (wikipedia)

Ahem. Instead of "hack" above, please feel free to substitute "metaheuristic" -- a new word I learnt while researching this series.

An "early abandonment hack", erm metaheuristic, would clearly be much faster than a brute force complete enumeration of all candidate solutions. How to find such a hack?

Know Your Data

To get a feel for what the data in the 4GB lookup tables actually looked like, I wrote some small Perl programs similar to the one below, which displays the data in 128-byte blocks:

use strict; use warnings; use List::MoreUtils qw(natatime); # See [id://861938] sub chunk_array { my ( $n, @vals ) = @_; my $str; my $iter = natatime( $n, @vals ); while ( my @line = $iter->() ) { $str .= join( " ", @line ) . "\n"; } return $str; } sub dump_blocks { my $file = shift; # bytemap file to read my $nblock = shift; # number of blocks to dump local $/ = \128; # Make <> operator read 128 bytes open( my $fh, '<', $file ) or die "error: open '$file': $!"; binmode($fh); for my $n ( 1 .. $nblock ) { my $block = <$fh>; length($block) == 128 or die "oops"; print "block $n:\n"; print chunk_array( 16, map { sprintf "%3d", ord } split //, $b +lock ); } close($fh); } # A 4GB bytemap consists of four 1GB physical files (suffix: "-0" .. " +-3"). # (See [id://1086481] for a program that generates a bytemap). my $bytemap_root = "lookzpM.byte"; my $fname = $bytemap_root . "-0"; dump_blocks( $fname, 16 );
which produced:
block 1: 0 118 0 126 0 126 0 102 0 102 0 94 0 94 0 86 0 86 0 94 0 94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 55 51 55 59 63 35 47 43 39 35 39 27 31 19 15 11 23 19 23 27 31 block 3: 0 111 0 103 0 103 0 127 0 79 0 119 0 119 0 127 0 111 0 103 0 103 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 114 0 122 0 114 0 106 0 98 0 90 0 66 0 74 0 82 0 90 0 82 44 42 40 34 36 58 56 2 12 0 8 50 52 58 56 50 44 42 40 34 36 block 5: 0 88 0 76 0 72 0 84 0 88 0 108 0 104 0 100 0 120 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 109 0 97 0 101 0 65 0 77 0 113 0 117 0 113 0 109 0 97 0 101 5 1 1 0 0 17 17 21 21 17 17 45 45 33 33 37 0 65 0 77 0 block 7: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 68 0 72 0 76 0 88 0 84 0 104 0 108 0 24 26 4 2 8 0 12 0 120 0 116 0 104 0 108 0 120 0 68 0 72 0 76 0 88 0 84 0 block 9: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 10: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 0 91 0 67 0 75 0 67 63 59 55 51 55 43 47 51 63 59 7 3 7 11 15 3 31 27 23 19 23 0 111 0 95 0 71 0 71 0 79 0 127 0 119 0 119 0 111 0 127 0 block 11: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 12: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124 0 112 0 116 0 112 0 124 0 64 0 68 0 64 0 92 0 80 0 84 22 16 30 28 30 0 6 4 6 0 62 60 62 48 54 52 54 48 62 60 62 0 70 0 70 0 94 0 94 0 86 0 0 0 0 0 0 0 0 0 0 0 block 13: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 85 0 89 0 93 0 73 0 69 0 121 0 125 0 105 0 117 0 121 0 125 1 9 5 5 1 25 29 29 17 105 0 85 0 89 0 93 0 73 0 69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 block 16: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 0 70 0 70 0 94 44 46 40 22 20 22 24 30 12 14 8 6 4 6 0 126 0 110 0 118 0 118 0 126 0 78 0 70 0 70 0 94 0 110 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Hmmm, every one of these blocks looks like it is one of just three distinct types:
  • All zeros
  • All non-zero numbers even
  • All non-zero numbers odd
Every one of those blocks is one of just three distinct types! Every one of those blocks is one of just three distinct types?!! Whoa. But does that hold true everywhere? To find out, I nervously ran a brute force search over all seven 4GB lookup tables and was elated to learn that this "theorem" does indeed hold true for every single block of every single lookup table!

Only odd modulos (i.e. 1001, 1003, 1005, ...) can produce a valid solution

-- ambrus' third theorem

For modulo 1001, each and every 128-byte block in all seven 4GB lookup tables contains either all zeros, all non-zero numbers even, or all non-zero numbers odd

-- ambrus' fourth theorem

Once again I could not restrain myself from adding to our list of the theorems of ambrus. :-) I expect ambrus' third theorem is easy to prove, his fourth much harder.

Curiously, out of all odd modulos in the range 1001..1221, the fourth theorem of ambrus applies to modulos 1001 and 1221 only. Why only those two? Who decides? Weird. Maybe ambrus can show mathematically why it must be so. I "proved" it only via brute force enumeration of hundreds of 4GB lookup tables.

For modulo 1003..1219 you would need to find a different heuristic to trim the search space (if one exists). That is the main reason why I used modulo 1001 only.

As you might have guessed, noticing this oddity in the lookup table data was a key breakthrough. Why?

  • All zero blocks can be skipped immediately. 125 searches eliminated.
  • To get a hit, all blocks must be even or all blocks must be odd (a value from an odd block can never match one from an even block).
  • We can encode each 128 byte block in just two bits: one to indicate zero or non-zero, the other to indicate odd or even. This reduces the lookup table size from 4GB down to just 8MB! This is crucial in reducing CPU cache misses.

Wait, there's more. Each of MDCLXVI can produce a valid solution only if all their lookup table blocks are of the same (non-zero) type. Assuming an equal number of blocks of each type, that occurs with a probability of (2/3) (first block non-zero) times (1/3)**6 (next six blocks match first non-zero block). After a preliminary check of all seven bitmaps therefore, only one in 1093.5 candidate solutions needs further (more expensive) checking for an exact match -- via a 4GB table lookup plus calculation, as detailed in earlier episodes of this series.

Bitmaps

Here is the code (64-bit compile, 32-bit int, 64-bit size_t) to create a 8MB bitmap from a 4GB lookup table:

#define LAST_CHUNK_IDX_4MB 33554431 #define LAST_CHUNK_IDX_8MB (33554432*2 - 1) #define BIT_MAP_SIZE_BYTES ((LAST_CHUNK_IDX_8MB+1) >> 3) typedef unsigned char bytev_t; void make_bitmapA( const char* bitfname, // in: bit map file name const bytev_t* bytevec, // in: byte vector unsigned char* bitvec // out: bitvec put here (assumed already z +eroed) ) { unsigned int uu; unsigned int uu2; unsigned int byte; unsigned int bit; const size_t* psz; const bytev_t* psy; int j; unsigned int neven; unsigned int nodd; unsigned int totzero = 0; unsigned int toteven = 0; unsigned int totodd = 0; for (uu = 0; uu <= LAST_CHUNK_IDX_4MB; ++uu) { psy = &bytevec[uu << 7]; psz = (size_t*)psy; // This is the zero case. if (psz[0] == 0 && psz[1] == 0 && psz[2] == 0 && psz[3] == 0 && psz[4] == 0 && psz[5] == 0 && psz[6] == 0 && psz[7] == 0 && psz[8] == 0 && psz[9] == 0 && psz[10] == 0 && psz[11] == 0 && psz[12] == 0 && psz[13] == 0 && psz[14] == 0 && psz[15] == 0) { // zero: both bits are zero, nothing to do ++totzero; continue; } // All non-zero elements in block will be all even or all odd! neven = 0; nodd = 0; for (j = 0; j < 128; ++j) { if (psy[j] != 0) { if (psy[j] % 2 == 0) { ++neven; } else { ++nodd; } } } if (neven > 0 && nodd > 0) { fprintf(stderr, "oops 1\n"); exit +(1); } if (neven == 0 && nodd == 0) { fprintf(stderr, "oops 2\n"); exit +(1); } uu2 = uu * 2; // nonzero: set first bit byte = uu2 >> 3; bit = uu2 & 7; bitvec[byte] |= (1 << (bit+1)); if (neven > 0) { // even: second bit is zero, nothing to do ++toteven; continue; } // odd: set second bit ++totodd; bitvec[byte] |= (1 << bit); } printf("zero=%u even=%u odd=%u from %u (%u)\n", totzero, toteven, t +otodd, LAST_CHUNK_IDX_4MB+1, uu); FILE* fh; printf("write %s\n", bitfname); fh = fopen(bitfname, "wb"); if (fh == NULL) { fprintf(stderr, "oops open '%s' failed: errno=%d\n", bitfname, e +rrno); exit(1); } if (fwrite(bitvec, BIT_MAP_SIZE_BYTES, 1, fh) != 1) { fprintf(stderr, "oops write '%s' failed: errno=%d\n", bitfname, +errno); exit(1); } fclose(fh); }

And here, finally, in all its glory, erm horror, is the inner loop code (64-bit compile, 32-bit int, 64-bit size_t) that actually found a solution:

#include <emmintrin.h> #include <smmintrin.h> #define H_PRIME 1000003 #define HASH_LEN 11 #define MOD_TARG 1001 #define M_TARG 1000 #define D_TARG 500 #define C_TARG 100 #define L_TARG 50 #define X_TARG 10 #define GET_TWO_BITS(x, p) (((x) >> (p)) & ((size_t)3)) #define UNROLL(qx) qqm = bytevecM[m7 ^ qx]; \ if (qqm != 0 \ && qqm == bytevecD[d7 ^ qx]) { \ qk[nqk++] = qqm; qk[nqk++] = qx; } #define UNROLL_M7A(qx) \ m7arr[qx] = (unsigned int)( (m6 ^ qx) * H_PRIME ); \ m7arr_r12[qx] = m7arr[qx] >> 12; \ _mm_prefetch((const char*)(&pbvM[m7arr_r12[qx]]), _MM_HINT_T0); #define UNROLL_M7B(qx) \ if ( (mceo = GET_TWO_BITS(pbvM[m7arr_r12[qx]], ((m7arr[qx] >> 6) & +62)) ) != 0) { \ d7 = (unsigned int)( (d6 ^ qx) * H_PRIME ); \ d7_r12 = d7 >> 12; \ _mm_prefetch((const char*)(&pbvD[d7_r12]), _MM_HINT_T0); \ qj[nqj++] = qx; \ qj[nqj++] = d7; \ qj[nqj++] = d7_r12; \ qj[nqj++] = mceo; } // ------------------------------------------------------------------- int inner( const unsigned char* bitvecM, const unsigned char* bitvecD, const unsigned char* bitvecC, const unsigned char* bitvecL, const unsigned char* bitvecX, const unsigned char* bitvecV, const unsigned char* bitvecI, const unsigned char* bytevecM, const unsigned char* bytevecD, int startval, int endval, int m2, int d2, int c2, int l2, int x2, int v2, int i2, __m128i* ps ) { __m128i s2 = _mm_set_epi32(i2, v2, x2, l2); __m128i hp = _mm_set1_epi32(H_PRIME); __m128i s3, s4, s5, s6; int m3, m4, m5, m6; int d3, d4, d5, d6; int c3, c4, c5, c6; unsigned int m7, d7, d7_r12, c7, c7_r12, l7, l7_r12, x7, x7_r12, v7 +, v7_r12, i7, i7_r12; int q3, q4, q5, q6, q7; int c8, l8, x8; int c9, l9, x9; unsigned int qqm; unsigned int q8; int jj; int iret = 0; int cnt; int qz; int nqj; int nqk; unsigned int qj[128*8]; unsigned int qk[128*8]; unsigned int m7arr[128]; unsigned int m7arr_r12[128]; unsigned int mceo, dceo, cceo, lceo, xceo, vceo, iceo; const size_t* pbvM = (size_t*)bitvecM; const size_t* pbvD = (size_t*)bitvecD; const size_t* pbvC = (size_t*)bitvecC; const size_t* pbvL = (size_t*)bitvecL; const size_t* pbvX = (size_t*)bitvecX; const size_t* pbvV = (size_t*)bitvecV; const size_t* pbvI = (size_t*)bitvecI; for (q3 = startval; q3 < endval; ++q3) { if (q3 == 10 || q3 == 13) continue; m3 = (m2 ^ q3) * H_PRIME; d3 = (d2 ^ q3) * H_PRIME; c3 = (c2 ^ q3) * H_PRIME; s3 = _mm_mullo_epi32(_mm_xor_si128(s2, _mm_set1_epi32(q3)), hp); for (q4 = 1; q4 < 128; ++q4) { if (q4 == 10 || q4 == 13) continue; m4 = (m3 ^ q4) * H_PRIME; d4 = (d3 ^ q4) * H_PRIME; c4 = (c3 ^ q4) * H_PRIME; s4 = _mm_mullo_epi32(_mm_xor_si128(s3, _mm_set1_epi32(q4)), h +p); for (q5 = 1; q5 < 128; ++q5) { if (q5 == 10 || q5 == 13) continue; m5 = (m4 ^ q5) * H_PRIME; d5 = (d4 ^ q5) * H_PRIME; c5 = (c4 ^ q5) * H_PRIME; s5 = _mm_mullo_epi32(_mm_xor_si128(s4, _mm_set1_epi32(q5)) +, hp); for (q6 = 1; q6 < 128; ++q6) { if (q6 == 10 || q6 == 13) continue; m6 = (m5 ^ q6) * H_PRIME; d6 = (d5 ^ q6) * H_PRIME; c6 = (c5 ^ q6) * H_PRIME; s6 = _mm_mullo_epi32(_mm_xor_si128(s5, _mm_set1_epi32(q +6)), hp); UNROLL_M7A(1) UNROLL_M7A(2) UNROLL_M7A(3) UNROLL_M7A(4) UNROLL_M7A(5) UNROLL_M7A(6) UNROLL_M7A(7) UNROLL_M7A(8) UNROLL_M7A(9) UNROLL_M7A(11) UNROLL_M7A(12) for (q7 = 14; q7 < 128; ++q7) { UNROLL_M7A(q7) } nqj = 0; UNROLL_M7B(1) UNROLL_M7B(2) UNROLL_M7B(3) UNROLL_M7B(4) UNROLL_M7B(5) UNROLL_M7B(6) UNROLL_M7B(7) UNROLL_M7B(8) UNROLL_M7B(9) UNROLL_M7B(11) UNROLL_M7B(12) for (q7 = 14; q7 < 128; ++q7) { // UNROLL_M7B(q7) mceo = GET_TWO_BITS(pbvM[m7arr_r12[q7]], ((m7arr[q7] + >> 6) & 62)); if (mceo == 0) continue; d7 = (unsigned int)( (d6 ^ q7) * H_PRIME ); d7_r12 = d7 >> 12; _mm_prefetch((const char*)(&pbvD[d7_r12]), _MM_HINT_ +T0); qj[nqj++] = q7; qj[nqj++] = d7; qj[nqj++] = d7_r12; qj[nqj++] = mceo; } nqk = 0; cnt = 0; while (cnt < nqj) { q7 = qj[cnt++]; d7 = qj[cnt++]; d7_r12 = qj[cnt++]; mceo = qj[cnt++]; dceo = GET_TWO_BITS(pbvD[d7_r12], ((d7 >> 6) & 62)); if (dceo != mceo) continue; c7 = (unsigned int)( (c6 ^ q7) * H_PRIME ); c7_r12 = c7 >> 12; _mm_prefetch((const char*)(&pbvC[c7_r12]), _MM_HINT_ +T0); qk[nqk++] = q7; qk[nqk++] = d7; qk[nqk++] = c7; qk[nqk++] = c7_r12; qk[nqk++] = mceo; } nqj = 0; cnt = 0; while (cnt < nqk) { q7 = qk[cnt++]; d7 = qk[cnt++]; c7 = qk[cnt++]; c7_r12 = qk[cnt++]; mceo = qk[cnt++]; cceo = GET_TWO_BITS(pbvC[c7_r12], ((c7 >> 6) & 62)); if (cceo != mceo) continue; l7 = (unsigned int)( (s6.m128i_i32[0] ^ q7) * H_PRIM +E ); l7_r12 = l7 >> 12; _mm_prefetch((const char*)(&pbvL[l7_r12]), _MM_HINT_ +T0); qj[nqj++] = q7; qj[nqj++] = d7; qj[nqj++] = c7; qj[nqj++] = l7; qj[nqj++] = l7_r12; qj[nqj++] = mceo; } nqk = 0; cnt = 0; while (cnt < nqj) { q7 = qj[cnt++]; d7 = qj[cnt++]; c7 = qj[cnt++]; l7 = qj[cnt++]; l7_r12 = qj[cnt++]; mceo = qj[cnt++]; lceo = GET_TWO_BITS(pbvL[l7_r12], ((l7 >> 6) & 62)); if (lceo != mceo) continue; x7 = (unsigned int)( (s6.m128i_i32[1] ^ q7) * H_PRIM +E ); x7_r12 = x7 >> 12; _mm_prefetch((const char*)(&pbvX[x7_r12]), _MM_HINT_ +T0); qk[nqk++] = q7; qk[nqk++] = d7; qk[nqk++] = c7; qk[nqk++] = l7; qk[nqk++] = x7; qk[nqk++] = x7_r12; qk[nqk++] = mceo; } nqj = 0; cnt = 0; while (cnt < nqk) { q7 = qk[cnt++]; d7 = qk[cnt++]; c7 = qk[cnt++]; l7 = qk[cnt++]; x7 = qk[cnt++]; x7_r12 = qk[cnt++]; mceo = qk[cnt++]; xceo = GET_TWO_BITS(pbvX[x7_r12], ((x7 >> 6) & 62)); if (xceo != mceo) continue; v7 = (unsigned int)( (s6.m128i_i32[2] ^ q7) * H_PRIM +E ); v7_r12 = v7 >> 12; _mm_prefetch((const char*)(&pbvV[v7_r12]), _MM_HINT_ +T0); qj[nqj++] = q7; qj[nqj++] = d7; qj[nqj++] = c7; qj[nqj++] = l7; qj[nqj++] = x7; qj[nqj++] = v7; qj[nqj++] = v7_r12; qj[nqj++] = mceo; } nqk = 0; cnt = 0; while (cnt < nqj) { q7 = qj[cnt++]; d7 = qj[cnt++]; c7 = qj[cnt++]; l7 = qj[cnt++]; x7 = qj[cnt++]; v7 = qj[cnt++]; v7_r12 = qj[cnt++]; mceo = qj[cnt++]; vceo = GET_TWO_BITS(pbvV[v7_r12], ((v7 >> 6) & 62)); if (vceo != mceo) continue; i7 = (unsigned int)( (s6.m128i_i32[3] ^ q7) * H_PRIM +E ); i7_r12 = i7 >> 12; _mm_prefetch((const char*)(&pbvI[i7_r12]), _MM_HINT_ +T0); qk[nqk++] = q7; qk[nqk++] = d7; qk[nqk++] = c7; qk[nqk++] = l7; qk[nqk++] = x7; qk[nqk++] = i7; qk[nqk++] = i7_r12; qk[nqk++] = mceo; } nqj = 0; cnt = 0; while (cnt < nqk) { q7 = qk[cnt++]; d7 = qk[cnt++]; c7 = qk[cnt++]; l7 = qk[cnt++]; x7 = qk[cnt++]; i7 = qk[cnt++]; i7_r12 = qk[cnt++]; mceo = qk[cnt++]; iceo = GET_TWO_BITS(pbvI[i7_r12], ((i7 >> 6) & 62)); if (iceo != mceo) continue; _mm_prefetch(&bytevecM[m7arr[q7] & 0xffffff80], _MM_ +HINT_T0); _mm_prefetch(&bytevecM[64+(m7arr[q7] & 0xffffff80)], + _MM_HINT_T0); _mm_prefetch(&bytevecD[d7 & 0xffffff80], _MM_HINT_T0 +); _mm_prefetch(&bytevecD[64+(d7 & 0xffffff80)], _MM_HI +NT_T0); qj[nqj++] = q7; qj[nqj++] = d7; qj[nqj++] = c7; qj[nqj++] = l7; qj[nqj++] = x7; } if (nqj == 0) continue; qz = 0; while (qz < nqj) { q7 = qj[qz++]; d7 = qj[qz++]; c7 = qj[qz++]; l7 = qj[qz++]; x7 = qj[qz++]; m7 = m7arr[q7]; nqk = 0; UNROLL(1) UNROLL(2) UNROLL(3) UNROLL(4) UNROLL(5) UNROLL(6) UNROLL(7) UNROLL(8) UNROLL(9) UNROLL(11) UNROLL(12) UNROLL(14) UNROLL(15) UNROLL(16) UNROLL(17) UNROLL(18) UNROLL(19) UNROLL(20) UNROLL(21) UNROLL(22) UNROLL(23) UNROLL(24) UNROLL(25) UNROLL(26) UNROLL(27) UNROLL(28) UNROLL(29) UNROLL(30) UNROLL(31) UNROLL(32) UNROLL(33) UNROLL(34) UNROLL(35) UNROLL(36) UNROLL(37) UNROLL(38) UNROLL(39) UNROLL(40) UNROLL(41) UNROLL(42) UNROLL(43) UNROLL(44) UNROLL(45) UNROLL(46) UNROLL(47) UNROLL(48) UNROLL(49) UNROLL(50) UNROLL(51) UNROLL(52) UNROLL(53) UNROLL(54) UNROLL(55) UNROLL(56) UNROLL(57) UNROLL(58) UNROLL(59) UNROLL(60) UNROLL(61) UNROLL(62) UNROLL(63) UNROLL(64) UNROLL(65) UNROLL(66) UNROLL(67) UNROLL(68) UNROLL(69) UNROLL(70) UNROLL(71) UNROLL(72) UNROLL(73) UNROLL(74) UNROLL(75) UNROLL(76) UNROLL(77) UNROLL(78) UNROLL(79) UNROLL(80) UNROLL(81) UNROLL(82) UNROLL(83) UNROLL(84) UNROLL(85) UNROLL(86) UNROLL(87) UNROLL(88) UNROLL(89) UNROLL(90) UNROLL(91) UNROLL(92) UNROLL(93) UNROLL(94) UNROLL(95) UNROLL(96) UNROLL(97) UNROLL(98) UNROLL(99) UNROLL(100) UNROLL(101) UNROLL(102) UNROLL(103) UNROLL(104) UNROLL(105) UNROLL(106) UNROLL(107) UNROLL(108) UNROLL(109) UNROLL(110) UNROLL(111) UNROLL(112) UNROLL(113) UNROLL(114) UNROLL(115) UNROLL(116) UNROLL(117) UNROLL(118) UNROLL(119) UNROLL(120) UNROLL(121) UNROLL(122) UNROLL(123) UNROLL(124) UNROLL(125) UNROLL(126) UNROLL(127) if (nqk == 0) continue; cnt = 0; while (cnt < nqk) { qqm = qk[cnt++]; q8 = qk[cnt++]; // Calculate instead of lookup to reduce memory f +ootprint. c8 = ((int)c7 ^ q8) * H_PRIME; jj = (c8 ^ qqm) ^ HASH_LEN; c9 = jj % MOD_TARG; if (c9 < 0) c9 += MOD_TARG; if (c9 == C_TARG) { l8 = ((int)l7 ^ q8) * H_PRIME; jj = (l8 ^ qqm) ^ HASH_LEN; l9 = jj % MOD_TARG; if (l9 < 0) l9 += MOD_TARG +; if (l9 == L_TARG) { x8 = ((int)x7 ^ q8) * H_PRIME; jj = (x8 ^ qqm) ^ HASH_LEN; x9 = jj % MOD_TARG; if (x9 < 0) x9 += MOD_T +ARG; if (x9 == X_TARG) { ps[iret++] = _mm_set_epi16(0, qqm, q8, q +7, q6, q5, q4, q3); } } } } } } } } } return iret; }

Notice that the above code uses the TLB, prefetch, vectorization and loop peeling techniques discussed in earlier episodes but this time with seven 8MB bitmaps.

This one runs in about 2.9 seconds = 2.9 * 125**4 / 60 / 60 / 24 / 365 = 22 years. Given multithreading and enough cores that should be fast enough to find a solution in a year or two.

As you might expect, this inner loop code was trivial to multithread. When I was suffering from high memory latency, Intel's hyperthreading gave me quite a lot. As I reduced this latency, hyper-threading gave me less and less until I didn't bother running more threads than there were physical cores on the machine.

In practice, this multithreaded code, running on two four-core physical machines, found a solution in about a year.

I expect its performance could be considerably further improved using the many excellent performance tips provided by oiskuu in replies to earlier episodes in this series.

The full list of six of seven solutions found by this program is:

1 23 105 120 47 93 48 104 16 39: 1000 500 100 50 10 5 734 1 61 61 73 46 107 4 103 114 54: 1000 500 100 50 10 916 1 1 68 125 58 6 81 39 76 57 34: 1000 500 100 50 10 956 1 1 77 91 35 121 50 78 4 40 117: 1000 500 100 50 10 5 95 2 24 70 80 56 33 49 14 84 38: 1000 500 100 50 10 5 720 2 55 7 58 19 95 66 106 123 76: 1000 500 100 50 10 5 223 2 78 45 126 107 78 4 125 50 39: 1000 500 100 50 10 996 1 2 119 112 12 43 73 103 42 63 82: 1000 500 100 50 10 5 197 3 3 110 96 21 116 76 79 62 126: 1000 500 100 50 10 5 886 3 48 91 63 25 24 4 1 60 21: 1000 500 100 50 10 37 1 3 48 91 63 25 24 4 1 61 21: 1000 500 100 50 10 37 1 3 82 4 111 53 42 62 67 50 32: 1000 500 100 50 10 910 1 3 87 44 79 85 80 45 6 107 93: 1000 500 100 50 10 5 448 3 120 99 112 40 69 102 8 114 14: 1000 500 100 50 10 530 1 3 124 119 88 127 67 22 113 67 123: 1000 500 100 50 10 954 1 4 20 27 28 11 64 88 81 120 126: 1000 500 100 50 10 5 191 4 43 14 62 25 58 27 36 68 79: 1000 500 100 50 10 5 29 4 89 72 27 70 115 122 7 50 29: 1000 500 100 50 10 5 289 4 89 72 27 70 115 122 7 51 29: 1000 500 100 50 10 5 273 5 8 124 117 9 72 5 111 96 110: 1000 500 100 50 10 5 878 5 8 124 117 9 72 5 111 112 110: 1000 500 100 50 10 5 622 5 42 77 27 93 51 126 85 59 97: 1000 500 100 50 10 5 936 5 52 94 25 97 87 9 76 114 20: 1000 500 100 50 10 147 1 5 61 120 123 86 20 62 35 42 119: 1000 500 100 50 10 5 315 6 14 32 112 84 121 98 26 116 19: 1000 500 100 50 10 277 1 6 25 53 101 4 114 61 55 56 59: 1000 500 100 50 10 5 247 6 40 29 66 1 11 41 110 84 15: 1000 500 100 50 10 5 183 6 59 34 115 24 123 71 86 85 80: 1000 500 100 50 10 5 922 6 90 30 49 53 77 9 43 49 109: 1000 500 100 50 10 864 1 6 90 40 43 27 11 2 111 83 31: 1000 500 100 50 10 5 335 6 127 122 67 20 12 106 17 47 90: 1000 500 100 50 10 5 89 7 82 45 71 6 60 20 45 122 42: 1000 500 100 50 10 147 1 7 91 114 61 33 121 14 33 127 8: 1000 500 100 50 10 5 960 8 24 74 68 29 46 54 28 45 122: 1000 500 100 50 10 157 1 8 81 99 126 73 33 84 39 23 58: 1000 500 100 50 10 3 1 8 109 37 7 102 98 19 25 27 60: 1000 500 100 50 10 5 143 9 8 60 1 94 6 51 38 78 100: 1000 500 100 50 10 810 1 9 21 3 59 26 7 72 92 114 1: 1000 500 100 50 10 5 107 9 97 39 6 74 69 102 33 28 20: 1000 500 100 50 10 5 307 9 99 60 53 116 57 54 109 36 63: 1000 500 100 50 10 177 1 9 100 85 27 24 79 114 88 2 83: 1000 500 100 50 10 722 1 11 31 6 74 111 80 49 21 5 106: 1000 500 100 50 10 750 1 11 112 65 42 28 21 91 105 107 97: 1000 500 100 50 10 235 1 11 119 73 69 89 121 102 103 82 5: 1000 500 100 50 10 908 1 12 19 63 11 15 120 67 58 102 65: 1000 500 100 50 10 5 167 12 30 27 7 52 91 33 77 80 59: 1000 500 100 50 10 5 574 12 34 117 60 98 3 116 40 62 59: 1000 500 100 50 10 476 1 12 54 14 99 51 123 48 42 84 1: 1000 500 100 50 10 313 1 12 93 75 47 78 84 12 60 83 32: 1000 500 100 50 10 954 1 14 36 71 80 101 39 123 49 33 85: 1000 500 100 50 10 57 1 14 37 20 37 26 91 96 8 51 14: 1000 500 100 50 10 5 448 14 76 18 127 16 48 79 1 27 24: 1000 500 100 50 10 185 1 14 79 111 28 29 81 7 68 121 49: 1000 500 100 50 10 5 301 14 87 50 108 7 117 30 72 61 1: 1000 500 100 50 10 5 942 14 97 109 101 47 78 124 106 98 61: 1000 500 100 50 10 519 1 14 104 76 107 79 22 79 75 112 56: 1000 500 100 50 10 750 1 14 108 43 91 86 71 39 63 93 56: 1000 500 100 50 10 900 1 14 113 51 97 31 112 94 80 67 123: 1000 500 100 50 10 926 1 15 19 103 89 2 124 124 48 50 89: 1000 500 100 50 10 483 1 16 17 30 115 63 98 20 51 55 14: 1000 500 100 50 10 5 13 16 122 8 28 24 106 121 34 69 33: 1000 500 100 50 10 5 608 17 11 119 60 47 44 78 103 125 48: 1000 500 100 50 10 5 1 17 32 37 28 15 44 79 63 34 102: 1000 500 100 50 10 652 1 17 38 34 46 117 66 91 43 93 76: 1000 500 100 50 10 5 115 17 39 32 53 7 127 114 8 77 22: 1000 500 100 50 10 115 1 17 57 37 112 36 73 78 53 39 119: 1000 500 100 50 10 5 63 17 57 107 60 113 117 72 64 56 17: 1000 500 100 50 10 988 1 17 107 81 48 51 8 106 7 62 69: 1000 500 100 50 10 105 1 17 123 66 76 11 24 57 58 7 65: 1000 500 100 50 10 5 974

In war, luck is half in everything

-- Napoleon

As in golf. Finding the solution as soon as I did, after searching only 17 of 125 numbers, was about a one in nine shot. Of course, as computer hardware improved over the next few years, the task would have got easier.

Some other statistics:

1: 4 2: 4 3: 7 4: 4 5: 5 6: 7 7: 2 8: 3 9: 5 11: 3 12: 5 14: 9 15: 1 16: 2 17: 8 Min: 1 Max: 9 Total: 69 number of 5 (V) hits found: 35 number of 1 (I) hits found: 35 Ave: 4.60

I also kept a histogram of how often each number (0..1001) appeared in the (V, I) positions of all "5 of 7 hits" solutions found (i.e. hitting M, D, C, L, X):

1: 64 3: 58 5: 73 7: 63 9: 46 11: 72 13: 58 15: 72 17: 61 19: 70 21: 63 23: 64 25: 62 27: 60 29: 66 31: 49 33: 64 35: 67 37: 42 39: 74 41: 86 43: 60 45: 52 47: 64 49: 49 51: 67 53: 68 55: 59 57: 60 59: 56 61: 72 63: 68 65: 56 67: 66 69: 59 71: 73 73: 59 75: 61 77: 64 79: 64 81: 52 83: 72 85: 45 87: 52 89: 68 91: 65 93: 55 95: 51 97: 72 99: 66 nhit = 50, nodd=50, neven=0, sum = 3109, min = 42, max = 86, ave=62.18 50 numbers hit 101: 50 103: 70 105: 62 107: 51 109: 49 111: 54 113: 55 115: 59 117: 72 119: 67 121: 58 123: 61 125: 64 127: 55 129: 42 131: 65 133: 59 135: 64 137: 43 139: 54 141: 46 143: 42 145: 62 147: 49 149: 42 151: 53 153: 49 155: 48 157: 47 159: 34 161: 45 163: 45 165: 44 167: 54 169: 51 171: 53 173: 60 175: 51 177: 45 179: 50 181: 40 183: 51 185: 51 187: 45 189: 48 191: 65 193: 51 195: 36 197: 51 199: 43 nhit = 50, nodd=50, neven=0, sum = 2605, min = 34, max = 72, ave=52.1 <snip> 800: 61 *EVEN* 802: 48 *EVEN* 804: 48 *EVEN* 806: 41 *EVEN* 808: 45 *EVEN* 810: 55 *EVEN* 812: 37 *EVEN* 814: 51 *EVEN* 816: 57 *EVEN* 818: 52 *EVEN* 820: 48 *EVEN* 822: 40 *EVEN* 824: 56 *EVEN* 826: 57 *EVEN* 828: 46 *EVEN* 830: 48 *EVEN* 832: 38 *EVEN* 834: 59 *EVEN* 836: 43 *EVEN* 838: 61 *EVEN* 840: 61 *EVEN* 842: 59 *EVEN* 844: 55 *EVEN* 846: 58 *EVEN* 848: 58 *EVEN* 850: 60 *EVEN* 852: 50 *EVEN* 854: 51 *EVEN* 856: 54 *EVEN* 858: 60 *EVEN* 860: 54 *EVEN* 862: 53 *EVEN* 864: 60 *EVEN* 866: 51 *EVEN* 868: 58 *EVEN* 870: 56 *EVEN* 872: 58 *EVEN* 874: 49 *EVEN* 876: 64 *EVEN* 878: 64 *EVEN* 880: 61 *EVEN* 882: 62 *EVEN* 884: 63 *EVEN* 886: 52 *EVEN* 888: 67 *EVEN* 890: 61 *EVEN* 892: 55 *EVEN* 894: 56 *EVEN* 896: 70 *EVEN* 898: 57 *EVEN* nhit = 50, nodd=0, neven=50, sum = 2728, min = 37, max = 70, ave=54.56 50 numbers hit 900: 55 *EVEN* 902: 49 *EVEN* 904: 47 *EVEN* 906: 76 *EVEN* 908: 43 *EVEN* 910: 65 *EVEN* 912: 57 *EVEN* 914: 52 *EVEN* 916: 74 *EVEN* 918: 79 *EVEN* 920: 60 *EVEN* 922: 69 *EVEN* 924: 57 *EVEN* 926: 68 *EVEN* 928: 57 *EVEN* 930: 80 *EVEN* 932: 54 *EVEN* 934: 84 *EVEN* 936: 67 *EVEN* 938: 64 *EVEN* 940: 59 *EVEN* 942: 73 *EVEN* 944: 64 *EVEN* 946: 72 *EVEN* 948: 68 *EVEN* 950: 64 *EVEN* 952: 58 *EVEN* 954: 65 *EVEN* 956: 62 *EVEN* 958: 62 *EVEN* 960: 66 *EVEN* 962: 59 *EVEN* 964: 60 *EVEN* 966: 64 *EVEN* 968: 71 *EVEN* 970: 59 *EVEN* 972: 68 *EVEN* 974: 73 *EVEN* 976: 56 *EVEN* 978: 64 *EVEN* 980: 56 *EVEN* 982: 57 *EVEN* 984: 62 *EVEN* 986: 61 *EVEN* 988: 80 *EVEN* 990: 62 *EVEN* 992: 51 *EVEN* 994: 63 *EVEN* 996: 74 *EVEN* 998: 64 *EVEN* 1000: 57 *EVEN* nhit = 51, nodd=0, neven=51, sum = 3231, min = 43, max = 84, ave=63.35 +29411764706
As you can see above, for some strange reason, only odd numbers were hit out of the first two hundred (of 0..1001) while only even numbers were hit in the last two hundred; the middle ranges featured a mixture of odd and even. I have no explanation for that, maybe it warrants yet another theorem of ambrus. :-)

Apart from keeping you interested, keeping statistics as you run a very long-running program is a good way to protect against bugs, to verify that the rate of finding "almost" solutions matches theoretical expectations. You certainly don't want a search program, like Deep Thought, that thinks silently for millions of years before telling you The Answer. Imagine the pain of leaving a program running for years, only to find it contained a silly bug which prevented a solution being found.

What was the point?

We all await The Final Chapter, when, we trust, All Mysteries Will Be Revealed.

-- sundialsvc4 in Re: At the risk of saying something stupid-but-obvious about Roman Numerals

Hopefully, it is clear by now that solving this problem did indeed require searching a vast space for a single elusive solution.

To clarify exactly what was achieved, run this Python 2.x program (update: with a 32-bit hash function, see responses below):

magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+chr(1 +03)+chr(125)+chr(48) for r in ["M", "D", "C", "L", "X", "V", "I"]: n = hash(r + magic) % 1001 print r, n
and you should see:
M 1000 D 500 C 100 L 50 X 10 V 5 I 1
To me, that's a rare and beautiful thing. A one in 10**21 event.

Apart from that, for me personally, solving this problem was a journey, forcing me out of my comfort zone, learning many interesting and new things. I also enjoy writing, telling a story. I hope you enjoyed reading this story as much as I enjoyed writing it.

References

Updated: Minor improvements to wording.

Comment on The 10**21 Problem (Part 4)
Select or Download Code
Re: The 10**21 Problem (Part 4)
by sundialsvc4 (Abbot) on May 19, 2014 at 13:05 UTC

    In my own defense ... :-) ... I guessed soon enough, as did we all, that this was really a discussion about a problem that did require nothing less than a massive-search to solve it.   And, that the only possible way to foreshorten such a search must be to find an early-abandonment strategy or strategies, and to prove that they work without taking 22 years to do so.   I did not mean to “diss” your work at any point, and I think that this was clearly understood by most Monks and by you.

    The point of my side-thread was, and is, that the search for algorithms to reduce a problem effectively can be tricky unto itself.   I have seen many “Roman Numerals decoder rings” that were very-complicated examples of recursion, simply because the designer in question didn’t hit-upon “read it backwards.”   (Not one of my teachers ever presented it to me that way.)   Exhaustive-searches have been done to hunt down problems ... or, the searches were much larger than they need be ... because (generally in the days before Google It™) the designer missed a single key stroke of insight.   Meant to be a parallel observation, not a rebuttal of any sort.   And, I think, mostly understood to be so.

    Although the search-space in this problem is extremely large such that bitmaps coupled with a vast amount of memory are required to solve it, we know that Moore’s Law will continue to hold true ... and this is a very nice demonstration of just how powerful the Perl language really is.

    Early-abandonment strategies aren’t new:   the “minimax” optimization used in game-playing programs is the simplest case.   (Once you know that a solution is sufficiently bad that it will cause any of the preceding nested look-aheads to be rejected, you can clip-away the entire subtree at the base because you know it can only get worse and do not care exactly how worse it gets.)   The extraordinary challenge presented by this problem is its size, and the difficulty of proving that a particular abandonment strategy holds.

    Very entertaining reading, well-worth all of its parts.   Well done, and thanks very much for sharing.

Re: The 10**21 Problem (Part 4)
by BrowserUk (Pope) on May 19, 2014 at 13:18 UTC
    Apart from that, for me personally, solving this problem was a journey, forcing me out of my comfort zone, learning many interesting and new things. I also enjoy writing, telling a story. I hope you enjoyed reading this story as much as I enjoyed writing it.

    I'll never be really good at golf!

    I have no problem with attacking a problem from all angles; running programs for days or even weeks to find solutions; and optimisation is one my most enjoyed pastimes and something of a passion. But there has to be -- at least notionally -- a practical use for the code or the results it produces.

    But that's my hang-up, and is no bad reflection upon your pastime. Millions of people spend their time knocking little balls into a field and then looking for them; and often as not have to pay exorbitant annual and per-game fees for the privileged. Others write long lists numbers in books. Yet more sit around all night in the freezing cold and/or rain, on the off chance that the clouds will clear long enough to peer at fuzzy blobs of light in the night sky. Each to their own "waste of time" :)

    For me, two things come out of this latest of your series:

    • All the optimisation techniques, especially those from oiskuu, are extremely informative and will be useful to me in the future.
    • Your statement "know your data" is one of the most oft overlooked -- and outright ignored -- missives in our industry.

      All too often people tackle tasks involving large datasets with the mindset that the must cater for the full generic range of possibilities for that data; when often large subsets of that range either cannot, or just usually do not occur.

      And in the latter case; in the rare event that the uncatered for data does occur, it can be shown that the results would be anomalous anyway.

    I enjoyed following along; albeit that I came late to the party. Thank you.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: The 10**21 Problem (Part 4)
by RMGir (Prior) on May 19, 2014 at 14:01 UTC
    Fascinating series, thanks for sharing.

    It'll be fun to see if these posts trigger some "competitive speed golfing" among the monks trying to find faster approaches :)


    Mike
Re: The 10**21 Problem (Part 4)
by wjw (Deacon) on May 19, 2014 at 15:48 UTC

    As I mentioned earlier: I am barely aware of the meaning of this. But you can darn well bet I will be re-reading this to better understand it. Thanks much for the excellent reading!

    ...the majority is always wrong, and always the last to know about it...
    Insanity: Doing the same thing over and over again and expecting different results...
Re: The 10**21 Problem (Part 4)
by RMGir (Prior) on May 19, 2014 at 16:33 UTC
    I'm afraid you didn't pick a very stable "magic function" to optimize for.

    It doesn't work with the first Python 2.x I tested it on (which turns out to be 2.7.2):

    $ python Python 2.7.2 (default, Oct 27 2011, 17:53:49) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> /bb/build/Linux-x86_64-32/release/robolibs/big2014.21-611907-20140 +519105223/lib/dpkgroot KeyboardInterrupt >>> magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+c +hr(103)+chr(125)+chr(48) >>> for r in ["M", "D", "C", "L", "X", "V", "I"]: ... n = hash(r + magic) % 1001 ... print r, n ... M 338 D 602 C 93 L 733 X 555 V 933 I 402
    It fails the same way with Python 2.6.7 on my Mac, but succeeds with Python 2.5.6:
    $ python2.5 Python 2.5.6 (r256:88840, Oct 11 2012, 20:14:10) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on +darwin Type "help", "copyright", "credits" or "license" for more information. >>> magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+c +hr(103)+chr(125)+chr(48) >>> for r in ["M", "D", "C", "L", "X", "V", "I"]: ... n = hash(r + magic) % 1001 ... print r, n ... M 1000 D 500 C 100 L 50 X 10 V 5 I 1

    Mike

      I get the same failure under 2.7.5 on Mac and 2.6.6 on CentOS as well.

      Luckily, it was accepted by both shinh's golf server, when I submitted it about a month ago, and the code golf server also (currently down). Shinh's server runs Python 2.7.2 on Linux, the code golf server Python 2.5 on Linux.

      It would be interesting to see the actual hash value and Python version by running this version:

      import sys print sys.version_info magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+chr(1 +03)+chr(125)+chr(48) for r in ["M", "D", "C", "L", "X", "V", "I"]: h = hash(r + magic) n = h % 1001 print r, n, "(" + str(h) + ")"
      which produced for me on Windows just now:
      sys.version_info(major=2, minor=7, micro=5, releaselevel='final', seri +al=0) M 1000 (2094745652) D 500 (1641493353) C 100 (277891714) L 50 (1566900385) X 10 (1666291637) V 5 (-1570643069) I 1 (1060402344)
      Strange, that this is the same Python version 2.7.5 as reported by mr_mischief but on a different platform (Windows vs MacOS). I tested both 32-bit and 64-bit Python on Windows (multiple versions) and didn't see a failure.

      On shinh's golf server it produced:

      sys.version_info(major=2, minor=7, micro=2, releaselevel='final', seri +al=0)\n M 1000 (2094745652) D 500 (1641493353) C 100 (277891714) L 50 (1566900385) X 10 (1666291637) V 5 (-1570643069) I 1 (1060402344)
      This is doubly strange because that is the same Python version 2.7.2 running on the same platform, Linux, which failed for RMGir!

      I will check the Python C sources version history later when I get more time.

        Very strange.

        I even checked to make sure bitness had no impact by running both the 32 and 64-bit versions of the python on my mac, but both give the same result.

        I was doing that wrong. Running a 32-bit version of python (by following these directions) on my mac, I get the answer you expect.

        So I suspect it's just that the python hash function works for your approach in 32-bit builds, but not in 64.


        Mike
Re: The 10**21 Problem (Part 4)
by spx2 (Chaplain) on May 21, 2014 at 18:59 UTC
    I admire your persistence a lot. I wish I had a lot more of that in everything I'm doing. Definitely an upvote. I am going to go through the other parts as well.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1086650]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2014-09-21 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls