### The 10**21 Problem (Part 4)

by eyepopslikeamosquito (Chancellor)
 on May 19, 2014 at 11:48 UTC Need Help??

In part three of this seemingly never-ending series on high performance computing we continued what may become the longest whittle in Perl Monks history by further reducing the running time of our magic formula search program, from 728 years down to just 286.

The latest round of improvements were achieved not by a major breakthrough, rather by applying a succession of micro-optimizations. To achieve the required massive speed boost, it was becoming increasingly clear that we needed to find a new breakthrough idea.

Constraint Satisfaction Problems

Constraint satisfaction problems (CSPs) are mathematical problems defined as a set of objects whose state must satisfy a number of constraints or limitations. CSPs are the subject of intense research in both artificial intelligence and operations research, since the regularity in their formulation provides a common basis to analyze and solve problems of many unrelated families. CSPs often exhibit high complexity, requiring a combination of heuristics and combinatorial search methods to be solved in a reasonable time.

-- Constraint satisfaction problem (wikipedia)

So, while the talk of this being a “10**21 problem” is an interesting one, and the latest installment (#3) about whacking the TLB of the processor is even more so, it seems like an incredible amount of work to unleash on a problem that has a trivial algorithmic solution ... This is on the one hand very interesting and computer-sciency, as I have already said, but the choice of example-problem puzzles me.

-- sundialsvc4

Roman numerals were the challenge at hand. The method of the example is common to constraint searches beyond the example. Constraint searches exist in drug research and many other fields ... Pick any field which has a large search space for just the right combination of properties in an as yet undiscovered item. Write a program which tries and makes a preliminary fitness determination for each possibility. Have that program spit out a short list of candidates for further investment of testing and development ... In this specific case, the fitness is a maximum length, a handful of inputs, and a handful of outputs that map correctly to those inputs ... The point of an example is that it is a concrete thing that is completed and shown rather than an abstract idea.

-- mr_mischief beautifully explains the point to sundialsvc4

mr_mischief was right on the money, sundialsvc4 on the wrong track.

This was indeed a classic constraint satisfaction problem ... which I felt woefully ill-equipped to solve. Lacking the mathematical ability of an ambrus, the only strategy I could think of was to desperately search for a hack, any hack, that would allow me to abandon a potential solution as early as possible, as soon I was certain it could not be successfully completed.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, or select a lower-level procedure or heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem, especially with incomplete or imperfect information or limited computation capacity.

-- Metaheuristic (wikipedia)

Ahem. Instead of "hack" above, please feel free to substitute "metaheuristic" -- a new word I learnt while researching this series.

An "early abandonment hack", erm metaheuristic, would clearly be much faster than a brute force complete enumeration of all candidate solutions. How to find such a hack?

To get a feel for what the data in the 4GB lookup tables actually looked like, I wrote some small Perl programs similar to the one below, which displays the data in 128-byte blocks:

```use strict;
use warnings;
use List::MoreUtils qw(natatime);

# See [id://861938]
sub chunk_array
{
my ( \$n, @vals ) = @_;
my \$str;
my \$iter = natatime( \$n, @vals );
while ( my @line = \$iter->() ) {
\$str .= join( " ", @line ) . "\n";
}
return \$str;
}

sub dump_blocks
{
my \$file   = shift;    # bytemap file to read
my \$nblock = shift;    # number of blocks to dump
local \$/ = \128;       # Make <> operator read 128 bytes
open( my \$fh, '<', \$file ) or die "error: open '\$file': \$!";
binmode(\$fh);
for my \$n ( 1 .. \$nblock ) {
my \$block = <\$fh>;
length(\$block) == 128 or die "oops";
print "block \$n:\n";
print chunk_array( 16, map { sprintf "%3d", ord } split //, \$b
+lock );
}
close(\$fh);
}

# A 4GB bytemap consists of four 1GB physical files (suffix: "-0" .. "
+-3").
# (See [id://1086481] for a program that generates a bytemap).

my \$bytemap_root = "lookzpM.byte";
my \$fname        = \$bytemap_root . "-0";
dump_blocks( \$fname, 16 );
which produced:
```block 1:
0 118   0 126   0 126   0 102   0 102   0  94   0  94   0  86
0  86   0  94   0  94   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 2:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0  75  55  51  55  59  63
35  47  43  39  35  39  27  31  19  15  11  23  19  23  27  31
block 3:
0 111   0 103   0 103   0 127   0  79   0 119   0 119   0 127
0 111   0 103   0 103   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 4:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0 114   0 122   0 114   0 106   0  98   0
90   0  66   0  74   0  82   0  90   0  82  44  42  40  34  36
58  56   2  12   0   8  50  52  58  56  50  44  42  40  34  36
block 5:
0  88   0  76   0  72   0  84   0  88   0 108   0 104   0 100
0 120   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 6:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0 109   0  97   0 101   0  65   0  77   0
113   0 117   0 113   0 109   0  97   0 101   5   1   1   0   0
17  17  21  21  17  17  45  45  33  33  37   0  65   0  77   0
block 7:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 8:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0  68   0  72   0  76   0  88   0  84   0
104   0 108   0  24  26   4   2   8   0  12   0 120   0 116   0
104   0 108   0 120   0  68   0  72   0  76   0  88   0  84   0
block 9:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 10:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0  83   0  91
0  67   0  75   0  67  63  59  55  51  55  43  47  51  63  59
7   3   7  11  15   3  31  27  23  19  23   0 111   0  95   0
71   0  71   0  79   0 127   0 119   0 119   0 111   0 127   0
block 11:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 12:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0 124   0 112   0 116   0 112   0 124   0  64   0  68   0  64
0  92   0  80   0  84  22  16  30  28  30   0   6   4   6   0
62  60  62  48  54  52  54  48  62  60  62   0  70   0  70   0
94   0  94   0  86   0   0   0   0   0   0   0   0   0   0   0
block 13:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 14:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0  85   0  89   0  93   0  73   0  69   0 121   0 125   0 105
0 117   0 121   0 125   1   9   5   5   1  25  29  29  17 105
0  85   0  89   0  93   0  73   0  69   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 15:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
block 16:
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0  78   0  70   0  70   0  94  44  46  40  22  20  22  24  30
12  14   8   6   4   6   0 126   0 110   0 118   0 118   0 126
0  78   0  70   0  70   0  94   0 110   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
Hmmm, every one of these blocks looks like it is one of just three distinct types:
• All zeros
• All non-zero numbers even
• All non-zero numbers odd
Every one of those blocks is one of just three distinct types! Every one of those blocks is one of just three distinct types?!! Whoa. But does that hold true everywhere? To find out, I nervously ran a brute force search over all seven 4GB lookup tables and was elated to learn that this "theorem" does indeed hold true for every single block of every single lookup table!

Only odd modulos (i.e. 1001, 1003, 1005, ...) can produce a valid solution

-- ambrus' third theorem

For modulo 1001, each and every 128-byte block in all seven 4GB lookup tables contains either all zeros, all non-zero numbers even, or all non-zero numbers odd

-- ambrus' fourth theorem

Once again I could not restrain myself from adding to our list of the theorems of ambrus. :-) I expect ambrus' third theorem is easy to prove, his fourth much harder.

Curiously, out of all odd modulos in the range 1001..1221, the fourth theorem of ambrus applies to modulos 1001 and 1221 only. Why only those two? Who decides? Weird. Maybe ambrus can show mathematically why it must be so. I "proved" it only via brute force enumeration of hundreds of 4GB lookup tables.

For modulo 1003..1219 you would need to find a different heuristic to trim the search space (if one exists). That is the main reason why I used modulo 1001 only.

As you might have guessed, noticing this oddity in the lookup table data was a key breakthrough. Why?

• All zero blocks can be skipped immediately. 125 searches eliminated.
• To get a hit, all blocks must be even or all blocks must be odd because a value from an odd block can never match one from an even block.
• We can encode each 128 byte block in just two bits: one to indicate zero or non-zero, the other to indicate odd or even. This reduces the lookup table size from 4GB down to just 8MB! This is crucial in reducing CPU cache misses.

Wait, there's more. Each of MDCLXVI can produce a valid solution only if all their lookup table blocks are of the same (non-zero) type. Assuming an equal number of blocks of each type, that occurs with a probability of (2/3) (first block non-zero) times (1/3)**6 (next six blocks match first non-zero block). After a preliminary check of all seven bitmaps therefore, only one in 1093.5 candidate solutions needs further (more expensive) checking for an exact match -- via a 4GB table lookup plus calculation, as detailed in earlier episodes of this series.

Bitmaps

Here is the code (64-bit compile, 32-bit int, 64-bit size_t) to create a 8MB bitmap from a 4GB lookup table:

```#define LAST_CHUNK_IDX_4MB      33554431
#define LAST_CHUNK_IDX_8MB      (33554432*2 - 1)
#define BIT_MAP_SIZE_BYTES      ((LAST_CHUNK_IDX_8MB+1) >> 3)
typedef unsigned char bytev_t;

void make_bitmapA(
const char* bitfname,    // in: bit map file name
const bytev_t* bytevec,  // in: byte vector
unsigned char* bitvec    // out: bitvec put here (assumed already z
+eroed)
)
{
unsigned int uu;
unsigned int uu2;
unsigned int byte;
unsigned int bit;
const size_t* psz;
const bytev_t* psy;
int j;
unsigned int neven;
unsigned int nodd;
unsigned int totzero = 0;
unsigned int toteven = 0;
unsigned int totodd = 0;
for (uu = 0; uu <= LAST_CHUNK_IDX_4MB; ++uu) {
psy = &bytevec[uu << 7];
psz = (size_t*)psy;
// This is the zero case.
if (psz[0]  == 0 && psz[1]  == 0
&&  psz[2]  == 0 && psz[3]  == 0
&&  psz[4]  == 0 && psz[5]  == 0
&&  psz[6]  == 0 && psz[7]  == 0
&&  psz[8]  == 0 && psz[9]  == 0
&&  psz[10] == 0 && psz[11] == 0
&&  psz[12] == 0 && psz[13] == 0
&&  psz[14] == 0 && psz[15] == 0) {
// zero: both bits are zero, nothing to do
++totzero;
continue;
}
// All non-zero elements in block will be all even or all odd!
neven = 0;
nodd = 0;
for (j = 0; j < 128; ++j) {
if (psy[j] != 0) {
if (psy[j] % 2 == 0) {
++neven;
}
else {
++nodd;
}
}
}
if (neven > 0 && nodd > 0)   { fprintf(stderr, "oops 1\n"); exit
+(1); }
if (neven == 0 && nodd == 0) { fprintf(stderr, "oops 2\n"); exit
+(1); }
uu2 = uu * 2;
// nonzero: set first bit
byte = uu2 >> 3;
bit  = uu2 & 7;
bitvec[byte] |= (1 << (bit+1));
if (neven > 0) {
// even: second bit is zero, nothing to do
++toteven;
continue;
}
// odd: set second bit
++totodd;
bitvec[byte] |= (1 << bit);
}
printf("zero=%u even=%u odd=%u from %u (%u)\n", totzero, toteven, t
+otodd, LAST_CHUNK_IDX_4MB+1, uu);
FILE* fh;
printf("write %s\n", bitfname);
fh = fopen(bitfname, "wb");
if (fh == NULL) {
fprintf(stderr, "oops open '%s' failed: errno=%d\n", bitfname, e
+rrno);
exit(1);
}
if (fwrite(bitvec, BIT_MAP_SIZE_BYTES, 1, fh) != 1) {
fprintf(stderr, "oops write '%s' failed: errno=%d\n", bitfname,
+errno);
exit(1);
}
fclose(fh);
}

And here, finally, in all its glory, erm horror, is the inner loop code (64-bit compile, 32-bit int, 64-bit size_t) that actually found a solution:

```#include <emmintrin.h>
#include <smmintrin.h>

#define H_PRIME 1000003

#define HASH_LEN  11

#define MOD_TARG  1001

#define M_TARG   1000
#define D_TARG    500
#define C_TARG    100
#define L_TARG     50
#define X_TARG     10

#define GET_TWO_BITS(x, p)  (((x) >> (p)) & ((size_t)3))

#define UNROLL(qx) qqm = bytevecM[m7 ^ qx];         \
if (qqm != 0                                      \
&&  qqm == bytevecD[d7 ^ qx]) {                   \
qk[nqk++] = qqm; qk[nqk++] = qx; }

#define UNROLL_M7A(qx) \
m7arr[qx] = (unsigned int)( (m6 ^ qx) * H_PRIME ); \
m7arr_r12[qx] = m7arr[qx] >> 12; \
_mm_prefetch((const char*)(&pbvM[m7arr_r12[qx]]), _MM_HINT_T0);

#define UNROLL_M7B(qx) \
if ( (mceo = GET_TWO_BITS(pbvM[m7arr_r12[qx]], ((m7arr[qx] >> 6) &
+62)) ) != 0) { \
d7 = (unsigned int)( (d6 ^ qx) * H_PRIME ); \
d7_r12 = d7 >> 12; \
_mm_prefetch((const char*)(&pbvD[d7_r12]), _MM_HINT_T0); \
qj[nqj++] = qx; \
qj[nqj++] = d7; \
qj[nqj++] = d7_r12; \
qj[nqj++] = mceo; }

// -------------------------------------------------------------------

int inner(
const unsigned char* bitvecM,
const unsigned char* bitvecD,
const unsigned char* bitvecC,
const unsigned char* bitvecL,
const unsigned char* bitvecX,
const unsigned char* bitvecV,
const unsigned char* bitvecI,
const unsigned char* bytevecM,
const unsigned char* bytevecD,
int startval, int endval,
int m2, int d2, int c2, int l2, int x2, int v2, int i2,
__m128i* ps
)
{
__m128i s2 = _mm_set_epi32(i2, v2, x2, l2);
__m128i hp = _mm_set1_epi32(H_PRIME);
__m128i s3, s4, s5, s6;
int m3, m4, m5, m6;
int d3, d4, d5, d6;
int c3, c4, c5, c6;
unsigned int m7, d7, d7_r12, c7, c7_r12, l7, l7_r12, x7, x7_r12, v7
+, v7_r12, i7, i7_r12;
int q3, q4, q5, q6, q7;
int c8, l8, x8;
int c9, l9, x9;
unsigned int qqm;
unsigned int q8;
int jj;
int iret = 0;
int cnt;
int qz;
int nqj;
int nqk;
unsigned int qj[128*8];
unsigned int qk[128*8];
unsigned int m7arr[128];
unsigned int m7arr_r12[128];
unsigned int mceo, dceo, cceo, lceo, xceo, vceo, iceo;
const size_t* pbvM = (size_t*)bitvecM;
const size_t* pbvD = (size_t*)bitvecD;
const size_t* pbvC = (size_t*)bitvecC;
const size_t* pbvL = (size_t*)bitvecL;
const size_t* pbvX = (size_t*)bitvecX;
const size_t* pbvV = (size_t*)bitvecV;
const size_t* pbvI = (size_t*)bitvecI;
for (q3 = startval; q3 < endval; ++q3) {
if (q3 == 10 || q3 == 13) continue;
m3 = (m2 ^ q3) * H_PRIME;
d3 = (d2 ^ q3) * H_PRIME;
c3 = (c2 ^ q3) * H_PRIME;
s3 = _mm_mullo_epi32(_mm_xor_si128(s2, _mm_set1_epi32(q3)), hp);
for (q4 = 1; q4 < 128; ++q4) {
if (q4 == 10 || q4 == 13) continue;
m4 = (m3 ^ q4) * H_PRIME;
d4 = (d3 ^ q4) * H_PRIME;
c4 = (c3 ^ q4) * H_PRIME;
s4 = _mm_mullo_epi32(_mm_xor_si128(s3, _mm_set1_epi32(q4)), h
+p);
for (q5 = 1; q5 < 128; ++q5) {
if (q5 == 10 || q5 == 13) continue;
m5 = (m4 ^ q5) * H_PRIME;
d5 = (d4 ^ q5) * H_PRIME;
c5 = (c4 ^ q5) * H_PRIME;
s5 = _mm_mullo_epi32(_mm_xor_si128(s4, _mm_set1_epi32(q5))
+, hp);
for (q6 = 1; q6 < 128; ++q6) {
if (q6 == 10 || q6 == 13) continue;
m6 = (m5 ^ q6) * H_PRIME;
d6 = (d5 ^ q6) * H_PRIME;
c6 = (c5 ^ q6) * H_PRIME;
s6 = _mm_mullo_epi32(_mm_xor_si128(s5, _mm_set1_epi32(q
+6)), hp);
UNROLL_M7A(1)
UNROLL_M7A(2)
UNROLL_M7A(3)
UNROLL_M7A(4)
UNROLL_M7A(5)
UNROLL_M7A(6)
UNROLL_M7A(7)
UNROLL_M7A(8)
UNROLL_M7A(9)
UNROLL_M7A(11)
UNROLL_M7A(12)
for (q7 = 14; q7 < 128; ++q7) {
UNROLL_M7A(q7)
}
nqj = 0;
UNROLL_M7B(1)
UNROLL_M7B(2)
UNROLL_M7B(3)
UNROLL_M7B(4)
UNROLL_M7B(5)
UNROLL_M7B(6)
UNROLL_M7B(7)
UNROLL_M7B(8)
UNROLL_M7B(9)
UNROLL_M7B(11)
UNROLL_M7B(12)
for (q7 = 14; q7 < 128; ++q7) {
// UNROLL_M7B(q7)
mceo = GET_TWO_BITS(pbvM[m7arr_r12[q7]], ((m7arr[q7]
+ >> 6) & 62));
if (mceo == 0) continue;
d7 = (unsigned int)( (d6 ^ q7) * H_PRIME );
d7_r12 = d7 >> 12;
_mm_prefetch((const char*)(&pbvD[d7_r12]), _MM_HINT_
+T0);
qj[nqj++] = q7;
qj[nqj++] = d7;
qj[nqj++] = d7_r12;
qj[nqj++] = mceo;
}
nqk = 0;
cnt = 0;
while (cnt < nqj) {
q7 = qj[cnt++];
d7 = qj[cnt++];
d7_r12 = qj[cnt++];
mceo = qj[cnt++];
dceo = GET_TWO_BITS(pbvD[d7_r12], ((d7 >> 6) & 62));
if (dceo != mceo) continue;
c7 = (unsigned int)( (c6 ^ q7) * H_PRIME );
c7_r12 = c7 >> 12;
_mm_prefetch((const char*)(&pbvC[c7_r12]), _MM_HINT_
+T0);
qk[nqk++] = q7;
qk[nqk++] = d7;
qk[nqk++] = c7;
qk[nqk++] = c7_r12;
qk[nqk++] = mceo;
}
nqj = 0;
cnt = 0;
while (cnt < nqk) {
q7 = qk[cnt++];
d7 = qk[cnt++];
c7 = qk[cnt++];
c7_r12 = qk[cnt++];
mceo = qk[cnt++];
cceo = GET_TWO_BITS(pbvC[c7_r12], ((c7 >> 6) & 62));
if (cceo != mceo) continue;
l7 = (unsigned int)( (s6.m128i_i32[0] ^ q7) * H_PRIM
+E );
l7_r12 = l7 >> 12;
_mm_prefetch((const char*)(&pbvL[l7_r12]), _MM_HINT_
+T0);
qj[nqj++] = q7;
qj[nqj++] = d7;
qj[nqj++] = c7;
qj[nqj++] = l7;
qj[nqj++] = l7_r12;
qj[nqj++] = mceo;
}
nqk = 0;
cnt = 0;
while (cnt < nqj) {
q7 = qj[cnt++];
d7 = qj[cnt++];
c7 = qj[cnt++];
l7 = qj[cnt++];
l7_r12 = qj[cnt++];
mceo = qj[cnt++];
lceo = GET_TWO_BITS(pbvL[l7_r12], ((l7 >> 6) & 62));
if (lceo != mceo) continue;
x7 = (unsigned int)( (s6.m128i_i32[1] ^ q7) * H_PRIM
+E );
x7_r12 = x7 >> 12;
_mm_prefetch((const char*)(&pbvX[x7_r12]), _MM_HINT_
+T0);
qk[nqk++] = q7;
qk[nqk++] = d7;
qk[nqk++] = c7;
qk[nqk++] = l7;
qk[nqk++] = x7;
qk[nqk++] = x7_r12;
qk[nqk++] = mceo;
}
nqj = 0;
cnt = 0;
while (cnt < nqk) {
q7 = qk[cnt++];
d7 = qk[cnt++];
c7 = qk[cnt++];
l7 = qk[cnt++];
x7 = qk[cnt++];
x7_r12 = qk[cnt++];
mceo = qk[cnt++];
xceo = GET_TWO_BITS(pbvX[x7_r12], ((x7 >> 6) & 62));
if (xceo != mceo) continue;
v7 = (unsigned int)( (s6.m128i_i32[2] ^ q7) * H_PRIM
+E );
v7_r12 = v7 >> 12;
_mm_prefetch((const char*)(&pbvV[v7_r12]), _MM_HINT_
+T0);
qj[nqj++] = q7;
qj[nqj++] = d7;
qj[nqj++] = c7;
qj[nqj++] = l7;
qj[nqj++] = x7;
qj[nqj++] = v7;
qj[nqj++] = v7_r12;
qj[nqj++] = mceo;
}
nqk = 0;
cnt = 0;
while (cnt < nqj) {
q7 = qj[cnt++];
d7 = qj[cnt++];
c7 = qj[cnt++];
l7 = qj[cnt++];
x7 = qj[cnt++];
v7 = qj[cnt++];
v7_r12 = qj[cnt++];
mceo = qj[cnt++];
vceo = GET_TWO_BITS(pbvV[v7_r12], ((v7 >> 6) & 62));
if (vceo != mceo) continue;
i7 = (unsigned int)( (s6.m128i_i32[3] ^ q7) * H_PRIM
+E );
i7_r12 = i7 >> 12;
_mm_prefetch((const char*)(&pbvI[i7_r12]), _MM_HINT_
+T0);
qk[nqk++] = q7;
qk[nqk++] = d7;
qk[nqk++] = c7;
qk[nqk++] = l7;
qk[nqk++] = x7;
qk[nqk++] = i7;
qk[nqk++] = i7_r12;
qk[nqk++] = mceo;
}
nqj = 0;
cnt = 0;
while (cnt < nqk) {
q7 = qk[cnt++];
d7 = qk[cnt++];
c7 = qk[cnt++];
l7 = qk[cnt++];
x7 = qk[cnt++];
i7 = qk[cnt++];
i7_r12 = qk[cnt++];
mceo = qk[cnt++];
iceo = GET_TWO_BITS(pbvI[i7_r12], ((i7 >> 6) & 62));
if (iceo != mceo) continue;
_mm_prefetch(&bytevecM[m7arr[q7] & 0xffffff80], _MM_
+HINT_T0);
_mm_prefetch(&bytevecM[64+(m7arr[q7] & 0xffffff80)],
+ _MM_HINT_T0);
_mm_prefetch(&bytevecD[d7 & 0xffffff80], _MM_HINT_T0
+);
_mm_prefetch(&bytevecD[64+(d7 & 0xffffff80)], _MM_HI
+NT_T0);
qj[nqj++] = q7;
qj[nqj++] = d7;
qj[nqj++] = c7;
qj[nqj++] = l7;
qj[nqj++] = x7;
}
if (nqj == 0) continue;
qz = 0;
while (qz < nqj) {
q7 = qj[qz++];
d7 = qj[qz++];
c7 = qj[qz++];
l7 = qj[qz++];
x7 = qj[qz++];
m7 = m7arr[q7];
nqk = 0;
UNROLL(1)
UNROLL(2)
UNROLL(3)
UNROLL(4)
UNROLL(5)
UNROLL(6)
UNROLL(7)
UNROLL(8)
UNROLL(9)
UNROLL(11)
UNROLL(12)
UNROLL(14)
UNROLL(15)
UNROLL(16)
UNROLL(17)
UNROLL(18)
UNROLL(19)
UNROLL(20)
UNROLL(21)
UNROLL(22)
UNROLL(23)
UNROLL(24)
UNROLL(25)
UNROLL(26)
UNROLL(27)
UNROLL(28)
UNROLL(29)
UNROLL(30)
UNROLL(31)
UNROLL(32)
UNROLL(33)
UNROLL(34)
UNROLL(35)
UNROLL(36)
UNROLL(37)
UNROLL(38)
UNROLL(39)
UNROLL(40)
UNROLL(41)
UNROLL(42)
UNROLL(43)
UNROLL(44)
UNROLL(45)
UNROLL(46)
UNROLL(47)
UNROLL(48)
UNROLL(49)
UNROLL(50)
UNROLL(51)
UNROLL(52)
UNROLL(53)
UNROLL(54)
UNROLL(55)
UNROLL(56)
UNROLL(57)
UNROLL(58)
UNROLL(59)
UNROLL(60)
UNROLL(61)
UNROLL(62)
UNROLL(63)
UNROLL(64)
UNROLL(65)
UNROLL(66)
UNROLL(67)
UNROLL(68)
UNROLL(69)
UNROLL(70)
UNROLL(71)
UNROLL(72)
UNROLL(73)
UNROLL(74)
UNROLL(75)
UNROLL(76)
UNROLL(77)
UNROLL(78)
UNROLL(79)
UNROLL(80)
UNROLL(81)
UNROLL(82)
UNROLL(83)
UNROLL(84)
UNROLL(85)
UNROLL(86)
UNROLL(87)
UNROLL(88)
UNROLL(89)
UNROLL(90)
UNROLL(91)
UNROLL(92)
UNROLL(93)
UNROLL(94)
UNROLL(95)
UNROLL(96)
UNROLL(97)
UNROLL(98)
UNROLL(99)
UNROLL(100)
UNROLL(101)
UNROLL(102)
UNROLL(103)
UNROLL(104)
UNROLL(105)
UNROLL(106)
UNROLL(107)
UNROLL(108)
UNROLL(109)
UNROLL(110)
UNROLL(111)
UNROLL(112)
UNROLL(113)
UNROLL(114)
UNROLL(115)
UNROLL(116)
UNROLL(117)
UNROLL(118)
UNROLL(119)
UNROLL(120)
UNROLL(121)
UNROLL(122)
UNROLL(123)
UNROLL(124)
UNROLL(125)
UNROLL(126)
UNROLL(127)
if (nqk == 0) continue;
cnt = 0;
while (cnt < nqk) {
qqm = qk[cnt++];
q8  = qk[cnt++];
// Calculate instead of lookup to reduce memory f
+ootprint.
c8 = ((int)c7 ^ q8) * H_PRIME;
jj = (c8 ^ qqm) ^ HASH_LEN;
c9 = jj % MOD_TARG; if (c9 < 0) c9 += MOD_TARG;
if (c9 == C_TARG) {
l8 = ((int)l7 ^ q8) * H_PRIME;
jj = (l8 ^ qqm) ^ HASH_LEN;
l9 = jj % MOD_TARG; if (l9 < 0) l9 += MOD_TARG
+;
if (l9 == L_TARG) {
x8 = ((int)x7 ^ q8) * H_PRIME;
jj = (x8 ^ qqm) ^ HASH_LEN;
x9 = jj % MOD_TARG; if (x9 < 0) x9 += MOD_T
+ARG;
if (x9 == X_TARG) {
ps[iret++] = _mm_set_epi16(0, qqm, q8, q
+7, q6, q5, q4, q3);
}
}
}
}
}
}
}
}
}
return iret;
}

Notice that the above code uses the TLB, prefetch, vectorization and loop peeling techniques discussed in earlier episodes but this time with seven 8MB bitmaps.

This one runs in about 2.9 seconds = 2.9 * 125**4 / 60 / 60 / 24 / 365 = 22 years. Given multithreading and enough cores that should be fast enough to find a solution in a year or two.

As you might expect, this inner loop code was trivial to multithread. When I was suffering from high memory latency, Intel's hyperthreading gave me quite a lot. As I reduced this latency, hyper-threading gave me less and less until I didn't bother running more threads than there were physical cores on the machine.

In practice, this multithreaded code, running on two four-core physical machines, found a solution in about a year.

I expect its performance could be considerably further improved using the many excellent performance tips provided by oiskuu in replies to earlier episodes in this series.

The full list of six of seven solutions found by this program is:

``` 1  23 105 120  47  93  48 104  16  39: 1000 500 100 50 10   5  734
1  61  61  73  46 107   4 103 114  54: 1000 500 100 50 10 916    1
1  68 125  58   6  81  39  76  57  34: 1000 500 100 50 10 956    1
1  77  91  35 121  50  78   4  40 117: 1000 500 100 50 10   5   95
2  24  70  80  56  33  49  14  84  38: 1000 500 100 50 10   5  720
2  55   7  58  19  95  66 106 123  76: 1000 500 100 50 10   5  223
2  78  45 126 107  78   4 125  50  39: 1000 500 100 50 10 996    1
2 119 112  12  43  73 103  42  63  82: 1000 500 100 50 10   5  197
3   3 110  96  21 116  76  79  62 126: 1000 500 100 50 10   5  886
3  48  91  63  25  24   4   1  60  21: 1000 500 100 50 10  37    1
3  48  91  63  25  24   4   1  61  21: 1000 500 100 50 10  37    1
3  82   4 111  53  42  62  67  50  32: 1000 500 100 50 10 910    1
3  87  44  79  85  80  45   6 107  93: 1000 500 100 50 10   5  448
3 120  99 112  40  69 102   8 114  14: 1000 500 100 50 10 530    1
3 124 119  88 127  67  22 113  67 123: 1000 500 100 50 10 954    1
4  20  27  28  11  64  88  81 120 126: 1000 500 100 50 10   5  191
4  43  14  62  25  58  27  36  68  79: 1000 500 100 50 10   5   29
4  89  72  27  70 115 122   7  50  29: 1000 500 100 50 10   5  289
4  89  72  27  70 115 122   7  51  29: 1000 500 100 50 10   5  273
5   8 124 117   9  72   5 111  96 110: 1000 500 100 50 10   5  878
5   8 124 117   9  72   5 111 112 110: 1000 500 100 50 10   5  622
5  42  77  27  93  51 126  85  59  97: 1000 500 100 50 10   5  936
5  52  94  25  97  87   9  76 114  20: 1000 500 100 50 10 147    1
5  61 120 123  86  20  62  35  42 119: 1000 500 100 50 10   5  315
6  14  32 112  84 121  98  26 116  19: 1000 500 100 50 10 277    1
6  25  53 101   4 114  61  55  56  59: 1000 500 100 50 10   5  247
6  40  29  66   1  11  41 110  84  15: 1000 500 100 50 10   5  183
6  59  34 115  24 123  71  86  85  80: 1000 500 100 50 10   5  922
6  90  30  49  53  77   9  43  49 109: 1000 500 100 50 10 864    1
6  90  40  43  27  11   2 111  83  31: 1000 500 100 50 10   5  335
6 127 122  67  20  12 106  17  47  90: 1000 500 100 50 10   5   89
7  82  45  71   6  60  20  45 122  42: 1000 500 100 50 10 147    1
7  91 114  61  33 121  14  33 127   8: 1000 500 100 50 10   5  960
8  24  74  68  29  46  54  28  45 122: 1000 500 100 50 10 157    1
8  81  99 126  73  33  84  39  23  58: 1000 500 100 50 10   3    1
8 109  37   7 102  98  19  25  27  60: 1000 500 100 50 10   5  143
9   8  60   1  94   6  51  38  78 100: 1000 500 100 50 10 810    1
9  21   3  59  26   7  72  92 114   1: 1000 500 100 50 10   5  107
9  97  39   6  74  69 102  33  28  20: 1000 500 100 50 10   5  307
9  99  60  53 116  57  54 109  36  63: 1000 500 100 50 10 177    1
9 100  85  27  24  79 114  88   2  83: 1000 500 100 50 10 722    1
11  31   6  74 111  80  49  21   5 106: 1000 500 100 50 10 750    1
11 112  65  42  28  21  91 105 107  97: 1000 500 100 50 10 235    1
11 119  73  69  89 121 102 103  82   5: 1000 500 100 50 10 908    1
12  19  63  11  15 120  67  58 102  65: 1000 500 100 50 10   5  167
12  30  27   7  52  91  33  77  80  59: 1000 500 100 50 10   5  574
12  34 117  60  98   3 116  40  62  59: 1000 500 100 50 10 476    1
12  54  14  99  51 123  48  42  84   1: 1000 500 100 50 10 313    1
12  93  75  47  78  84  12  60  83  32: 1000 500 100 50 10 954    1
14  36  71  80 101  39 123  49  33  85: 1000 500 100 50 10  57    1
14  37  20  37  26  91  96   8  51  14: 1000 500 100 50 10   5  448
14  76  18 127  16  48  79   1  27  24: 1000 500 100 50 10 185    1
14  79 111  28  29  81   7  68 121  49: 1000 500 100 50 10   5  301
14  87  50 108   7 117  30  72  61   1: 1000 500 100 50 10   5  942
14  97 109 101  47  78 124 106  98  61: 1000 500 100 50 10 519    1
14 104  76 107  79  22  79  75 112  56: 1000 500 100 50 10 750    1
14 108  43  91  86  71  39  63  93  56: 1000 500 100 50 10 900    1
14 113  51  97  31 112  94  80  67 123: 1000 500 100 50 10 926    1
15  19 103  89   2 124 124  48  50  89: 1000 500 100 50 10 483    1
16  17  30 115  63  98  20  51  55  14: 1000 500 100 50 10   5   13
16 122   8  28  24 106 121  34  69  33: 1000 500 100 50 10   5  608
17  11 119  60  47  44  78 103 125  48: 1000 500 100 50 10   5    1
17  32  37  28  15  44  79  63  34 102: 1000 500 100 50 10 652    1
17  38  34  46 117  66  91  43  93  76: 1000 500 100 50 10   5  115
17  39  32  53   7 127 114   8  77  22: 1000 500 100 50 10 115    1
17  57  37 112  36  73  78  53  39 119: 1000 500 100 50 10   5   63
17  57 107  60 113 117  72  64  56  17: 1000 500 100 50 10 988    1
17 107  81  48  51   8 106   7  62  69: 1000 500 100 50 10 105    1
17 123  66  76  11  24  57  58   7  65: 1000 500 100 50 10   5  974

In war, luck is half in everything

-- Napoleon

As in golf. Finding the solution as soon as I did, after searching only 17 of 125 numbers, was about a one in nine shot. Of course, as computer hardware improved over the next few years, the task would have got easier.

Some other statistics:

```1: 4
2: 4
3: 7
4: 4
5: 5
6: 7
7: 2
8: 3
9: 5
11: 3
12: 5
14: 9
15: 1
16: 2
17: 8

Min: 1
Max: 9
Total: 69
number of 5 (V) hits found: 35
number of 1 (I) hits found: 35
Ave: 4.60

I also kept a histogram of how often each number (0..1001) appeared in the (V, I) positions of all "5 of 7 hits" solutions found (i.e. hitting M, D, C, L, X):

``` 1: 64
3: 58
5: 73
7: 63
9: 46
11: 72
13: 58
15: 72
17: 61
19: 70
21: 63
23: 64
25: 62
27: 60
29: 66
31: 49
33: 64
35: 67
37: 42
39: 74
41: 86
43: 60
45: 52
47: 64
49: 49
51: 67
53: 68
55: 59
57: 60
59: 56
61: 72
63: 68
65: 56
67: 66
69: 59
71: 73
73: 59
75: 61
77: 64
79: 64
81: 52
83: 72
85: 45
87: 52
89: 68
91: 65
93: 55
95: 51
97: 72
99: 66
nhit = 50, nodd=50, neven=0, sum = 3109, min = 42, max = 86, ave=62.18
50 numbers hit
101: 50
103: 70
105: 62
107: 51
109: 49
111: 54
113: 55
115: 59
117: 72
119: 67
121: 58
123: 61
125: 64
127: 55
129: 42
131: 65
133: 59
135: 64
137: 43
139: 54
141: 46
143: 42
145: 62
147: 49
149: 42
151: 53
153: 49
155: 48
157: 47
159: 34
161: 45
163: 45
165: 44
167: 54
169: 51
171: 53
173: 60
175: 51
177: 45
179: 50
181: 40
183: 51
185: 51
187: 45
189: 48
191: 65
193: 51
195: 36
197: 51
199: 43
nhit = 50, nodd=50, neven=0, sum = 2605, min = 34, max = 72, ave=52.1
<snip>
800: 61 *EVEN*
802: 48 *EVEN*
804: 48 *EVEN*
806: 41 *EVEN*
808: 45 *EVEN*
810: 55 *EVEN*
812: 37 *EVEN*
814: 51 *EVEN*
816: 57 *EVEN*
818: 52 *EVEN*
820: 48 *EVEN*
822: 40 *EVEN*
824: 56 *EVEN*
826: 57 *EVEN*
828: 46 *EVEN*
830: 48 *EVEN*
832: 38 *EVEN*
834: 59 *EVEN*
836: 43 *EVEN*
838: 61 *EVEN*
840: 61 *EVEN*
842: 59 *EVEN*
844: 55 *EVEN*
846: 58 *EVEN*
848: 58 *EVEN*
850: 60 *EVEN*
852: 50 *EVEN*
854: 51 *EVEN*
856: 54 *EVEN*
858: 60 *EVEN*
860: 54 *EVEN*
862: 53 *EVEN*
864: 60 *EVEN*
866: 51 *EVEN*
868: 58 *EVEN*
870: 56 *EVEN*
872: 58 *EVEN*
874: 49 *EVEN*
876: 64 *EVEN*
878: 64 *EVEN*
880: 61 *EVEN*
882: 62 *EVEN*
884: 63 *EVEN*
886: 52 *EVEN*
888: 67 *EVEN*
890: 61 *EVEN*
892: 55 *EVEN*
894: 56 *EVEN*
896: 70 *EVEN*
898: 57 *EVEN*
nhit = 50, nodd=0, neven=50, sum = 2728, min = 37, max = 70, ave=54.56
50 numbers hit
900: 55 *EVEN*
902: 49 *EVEN*
904: 47 *EVEN*
906: 76 *EVEN*
908: 43 *EVEN*
910: 65 *EVEN*
912: 57 *EVEN*
914: 52 *EVEN*
916: 74 *EVEN*
918: 79 *EVEN*
920: 60 *EVEN*
922: 69 *EVEN*
924: 57 *EVEN*
926: 68 *EVEN*
928: 57 *EVEN*
930: 80 *EVEN*
932: 54 *EVEN*
934: 84 *EVEN*
936: 67 *EVEN*
938: 64 *EVEN*
940: 59 *EVEN*
942: 73 *EVEN*
944: 64 *EVEN*
946: 72 *EVEN*
948: 68 *EVEN*
950: 64 *EVEN*
952: 58 *EVEN*
954: 65 *EVEN*
956: 62 *EVEN*
958: 62 *EVEN*
960: 66 *EVEN*
962: 59 *EVEN*
964: 60 *EVEN*
966: 64 *EVEN*
968: 71 *EVEN*
970: 59 *EVEN*
972: 68 *EVEN*
974: 73 *EVEN*
976: 56 *EVEN*
978: 64 *EVEN*
980: 56 *EVEN*
982: 57 *EVEN*
984: 62 *EVEN*
986: 61 *EVEN*
988: 80 *EVEN*
990: 62 *EVEN*
992: 51 *EVEN*
994: 63 *EVEN*
996: 74 *EVEN*
998: 64 *EVEN*
1000: 57 *EVEN*
nhit = 51, nodd=0, neven=51, sum = 3231, min = 43, max = 84, ave=63.35
+29411764706
As you can see above, for some strange reason, only odd numbers were hit out of the first two hundred (of 0..1001) while only even numbers were hit in the last two hundred; the middle ranges featured a mixture of odd and even. I have no explanation for that, maybe it warrants yet another theorem of ambrus. :-)

Apart from keeping you interested, keeping statistics as you run a very long-running program is a good way to protect against bugs, to verify that the rate of finding "almost" solutions matches theoretical expectations. You certainly don't want a search program, like Deep Thought, that thinks silently for millions of years before telling you The Answer. Imagine the pain of leaving a program running for years, only to find it contained a silly bug which prevented a solution being found.

What was the point?

We all await The Final Chapter, when, we trust, All Mysteries Will Be Revealed.

Hopefully, it is clear by now that solving this problem did indeed require searching a vast space for a single elusive solution.

To clarify exactly what was achieved, run this Python 2.x program (update: with a 32-bit hash function, see responses below):

```magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+chr(1
+03)+chr(125)+chr(48)
for r in ["M", "D", "C", "L", "X", "V", "I"]:
n = hash(r + magic) % 1001
print r, n
and you should see:
```M 1000
D 500
C 100
L 50
X 10
V 5
I 1
To me, that's a rare and beautiful thing. A one in 10**21 event.

Apart from that, for me personally, solving this problem was a journey, forcing me out of my comfort zone, learning many interesting and new things. I also enjoy writing, telling a story. I hope you enjoyed reading this story as much as I enjoyed writing it.

References

Updated: Minor improvements to wording.

Replies are listed 'Best First'.
Re: The 10**21 Problem (Part 4)
by BrowserUk (Pope) on May 19, 2014 at 13:18 UTC
Apart from that, for me personally, solving this problem was a journey, forcing me out of my comfort zone, learning many interesting and new things. I also enjoy writing, telling a story. I hope you enjoyed reading this story as much as I enjoyed writing it.

I'll never be really good at golf!

I have no problem with attacking a problem from all angles; running programs for days or even weeks to find solutions; and optimisation is one my most enjoyed pastimes and something of a passion. But there has to be -- at least notionally -- a practical use for the code or the results it produces.

But that's my hang-up, and is no bad reflection upon your pastime. Millions of people spend their time knocking little balls into a field and then looking for them; and often as not have to pay exorbitant annual and per-game fees for the privileged. Others write long lists numbers in books. Yet more sit around all night in the freezing cold and/or rain, on the off chance that the clouds will clear long enough to peer at fuzzy blobs of light in the night sky. Each to their own "waste of time" :)

For me, two things come out of this latest of your series:

• All the optimisation techniques, especially those from oiskuu, are extremely informative and will be useful to me in the future.
• Your statement "know your data" is one of the most oft overlooked -- and outright ignored -- missives in our industry.

All too often people tackle tasks involving large datasets with the mindset that the must cater for the full generic range of possibilities for that data; when often large subsets of that range either cannot, or just usually do not occur.

And in the latter case; in the rare event that the uncatered for data does occur, it can be shown that the results would be anomalous anyway.

I enjoyed following along; albeit that I came late to the party. Thank you.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
Re: The 10**21 Problem (Part 4)
by RMGir (Prior) on May 19, 2014 at 16:33 UTC
I'm afraid you didn't pick a very stable "magic function" to optimize for.

It doesn't work with the first Python 2.x I tested it on (which turns out to be 2.7.2):

```\$ python
Python 2.7.2 (default, Oct 27 2011, 17:53:49)
[GCC 4.3.2] on linux2
>>> /bb/build/Linux-x86_64-32/release/robolibs/big2014.21-611907-20140
+519105223/lib/dpkgroot
KeyboardInterrupt
>>> magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+c
+hr(103)+chr(125)+chr(48)
>>> for r in ["M", "D", "C", "L", "X", "V", "I"]:
...   n = hash(r + magic) % 1001
...   print r, n
...
M 338
D 602
C 93
L 733
X 555
V 933
I 402
It fails the same way with Python 2.6.7 on my Mac, but succeeds with Python 2.5.6:
```\$ python2.5
Python 2.5.6 (r256:88840, Oct 11 2012, 20:14:10)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on
+darwin
>>> magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+c
+hr(103)+chr(125)+chr(48)
>>> for r in ["M", "D", "C", "L", "X", "V", "I"]:
...   n = hash(r + magic) % 1001
...   print r, n
...
M 1000
D 500
C 100
L 50
X 10
V 5
I 1

Mike

Luckily, it was accepted by both shinh's golf server, when I submitted it about a month ago, and the code golf server also (currently down). Shinh's server runs Python 2.7.2 on Linux, the code golf server Python 2.5 on Linux.

It would be interesting to see the actual hash value and Python version by running this version:

```import sys
print sys.version_info
magic = chr(17)+chr(11)+chr(119)+chr(60)+chr(47)+chr(44)+chr(78)+chr(1
+03)+chr(125)+chr(48)
for r in ["M", "D", "C", "L", "X", "V", "I"]:
h = hash(r + magic)
n = h % 1001
print r, n, "(" + str(h) + ")"
which produced for me on Windows just now:
```sys.version_info(major=2, minor=7, micro=5, releaselevel='final', seri
+al=0)
M 1000 (2094745652)
D 500 (1641493353)
C 100 (277891714)
L 50 (1566900385)
X 10 (1666291637)
V 5 (-1570643069)
I 1 (1060402344)
Strange, that this is the same Python version 2.7.5 as reported by mr_mischief but on a different platform (Windows vs MacOS). I tested both 32-bit and 64-bit Python on Windows (multiple versions) and didn't see a failure.

On shinh's golf server it produced:

```sys.version_info(major=2, minor=7, micro=2, releaselevel='final', seri
+al=0)\n
M 1000 (2094745652)
D 500 (1641493353)
C 100 (277891714)
L 50 (1566900385)
X 10 (1666291637)
V 5 (-1570643069)
I 1 (1060402344)
This is doubly strange because that is the same Python version 2.7.2 running on the same platform, Linux, which failed for RMGir!

I will check the Python C sources version history later when I get more time.

Very strange.

I even checked to make sure bitness had no impact by running both the 32 and 64-bit versions of the python on my mac, but both give the same result.

I was doing that wrong. Running a 32-bit version of python (by following these directions) on my mac, I get the answer you expect.

So I suspect it's just that the python hash function works for your approach in 32-bit builds, but not in 64.

Mike

I get the same failure under 2.7.5 on Mac and 2.6.6 on CentOS as well.

Re: The 10**21 Problem (Part 4)
by RMGir (Prior) on May 19, 2014 at 14:01 UTC
Fascinating series, thanks for sharing.

It'll be fun to see if these posts trigger some "competitive speed golfing" among the monks trying to find faster approaches :)

Mike
Re: The 10**21 Problem (Part 4)
by sundialsvc4 (Abbot) on May 19, 2014 at 13:05 UTC

In my own defense ... :-) ... I guessed soon enough, as did we all, that this was really a discussion about a problem that did require nothing less than a massive-search to solve it.   And, that the only possible way to foreshorten such a search must be to find an early-abandonment strategy or strategies, and to prove that they work without taking 22 years to do so.   I did not mean to “diss” your work at any point, and I think that this was clearly understood by most Monks and by you.

The point of my side-thread was, and is, that the search for algorithms to reduce a problem effectively can be tricky unto itself.   I have seen many “Roman Numerals decoder rings” that were very-complicated examples of recursion, simply because the designer in question didn’t hit-upon “read it backwards.”   (Not one of my teachers ever presented it to me that way.)   Exhaustive-searches have been done to hunt down problems ... or, the searches were much larger than they need be ... because (generally in the days before Google It™) the designer missed a single key stroke of insight.   Meant to be a parallel observation, not a rebuttal of any sort.   And, I think, mostly understood to be so.

Although the search-space in this problem is extremely large such that bitmaps coupled with a vast amount of memory are required to solve it, we know that Moore’s Law will continue to hold true ... and this is a very nice demonstration of just how powerful the Perl language really is.

Early-abandonment strategies aren’t new:   the “minimax” optimization used in game-playing programs is the simplest case.   (Once you know that a solution is sufficiently bad that it will cause any of the preceding nested look-aheads to be rejected, you can clip-away the entire subtree at the base because you know it can only get worse and do not care exactly how worse it gets.)   The extraordinary challenge presented by this problem is its size, and the difficulty of proving that a particular abandonment strategy holds.

Very entertaining reading, well-worth all of its parts.   Well done, and thanks very much for sharing.

Re: The 10**21 Problem (Part 4)
by wjw (Curate) on May 19, 2014 at 15:48 UTC

As I mentioned earlier: I am barely aware of the meaning of this. But you can darn well bet I will be re-reading this to better understand it. Thanks much for the excellent reading!

...the majority is always wrong, and always the last to know about it...
Insanity: Doing the same thing over and over again and expecting different results...
Re: The 10**21 Problem (Part 4)
by spx2 (Deacon) on May 21, 2014 at 18:59 UTC
I admire your persistence a lot. I wish I had a lot more of that in everything I'm doing. Definitely an upvote. I am going to go through the other parts as well.

Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1086650]
Approved by marto
Front-paged by Corion
help
Chatterbox?
NodeReaper buffs his metatarsals

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2017-08-18 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (297 votes). Check out past polls.

Notices?