When I looked for magic formulae, etc., I would give up after 20 minutes!
 Jasper
Unlike Jasper, I have spent more than 20 minutes searching for magic formulae. Especially ancient Roman ones. I first became aware of them back in 2006 during the Fonality Christmas Golf Challenge where I was astonished by Ton's ingenuity and deviousness in constructing his original HART (Hospelian Arabic to Roman Transform) magic formula.
Shortly after the Fonality game, codegolf.com hosted an endless competition where you must convert the other way, from Roman to Decimal. By 2009, I had managed to take the lead in all four languages (Perl, Python, Ruby, PHP), as detailed in this series of nodes. And the winning solutions  in all four languages  all used (quite different) magic formulae!
To refresh your memory, the codegolf game rules were essentially:
Convert a Roman numeral to its integer value. The input numeral, provided on stdin followed by a single newline, is guaranteed to be in the range I to MMMCMXCIX (1 to 3999). For example, given IV on stdin, your program should print 4 to stdout.
As you might expect, a key component of solutions to this game is mapping individual Roman letters to their decimal equivalents. To help us focus on that, let's define a simpler spec:
Write a function to convert a single Roman Numeral letter to its decimal equivalent. The function assumes the Roman letter is in $_ and returns its decimal equivalent.
To clarify, here's a sample (nongolfed) Perl solution:
Running this program produces:sub r { my %h = ( I=>1, V=>5, X=>10, L=>50, C=>100, D=>500, M=>1000 ); return $h{$_}; } print "$_: ", 0+r(), "\n" for (qw(I V X L C D M));
I: 1 V: 5 X: 10 L: 50 C: 100 D: 500 M: 1000
Lookup Table vs Magic Formula
In code golf, there is often a battle between a lookup table and a magic formula. So it proved here. When converting from Roman to Decimal, magic formula trumps lookup table.
Here are three attempts to solve this problem using a lookup table:
sub r {%h=(I,1,V,5,X,10,L,50,C,100,D,500,M,1000);$h{$_}} sub r {M1000D500C100L50X10V5I1=~$_;$'} sub r {M999D499C99L49X9V4I=~$_+$'}
And here are some of my earliest magic formulae from 2007:
Back then, my magic formula searching skills were so poor that I erroneously concluded that the lookup table approach was better. :)sub r {5**y/VLD//.E.(3*/M/+2*/CD/+/XL/)} sub r {1 .E.~ord()*41%52%5>>y/VLD//} sub r {5**y/VLD//.E.ord()x3%75%50%4} sub r {1 .E.(72^ord)*5/7%5>>y/VLD//} sub r {5**y/VLD//.E.(42^88*ord)%5} sub r {1 .E.(3^77%ord)%7>>y/VLD//}
Later, when I tackled this problem in Python, I really needed to use each Roman letter once only in the formula, which forced me to explore alternative approaches ... which, in turn, led to still shorter Perl magic formulae, such as:
Back in the good old days, the little search programs that uncovered these solutions took hours to run, not years. :)sub r {10**(7&69303333/ord)%9995} sub r {10**(7&5045e8/ord)%2857} # needs 64bit Perl sub r {IXCMVLD=~$_;"1E@"%9995} sub r {XCMVLD=~$_;"1E@+"%9995} # Grimy improvement
The long numbers (such as 69303333 above) that began popping up in these formulae were an indication that the ord function didn't scale very well as the required solutions became less probable. Can we find a builtin function better suited to Roman magic formulae? In PHP and Python, yes. In Perl and Ruby, probably not. At least, I don't know of one.
The PHP builtin md5 function is perfect for Roman magic formulae. Not only does it generate all required digits (09), it generates little else (just af). Moreover, you can use all 256 characters in a magic formula, compared to just ten (09) for ord. To illustrate, in an eight character magic string (for example 69303333), there are just 10**8 combinations available for ord, while there are a mindboggling 256**8=1.8*10**19 combinations available for md5! This is a huge advantage when searching for highly improbable solutions, as we shall see later.
The Python hash function is also superior to ord, though not as good as PHP's md5. This is because Python's Unicode and character escaping claptrap limits you to 125 characters (namely ord 1..9,11,12,14..127) that can be used as input to the hash function without special treatment. Still, 125 is a huge improvement over 10! One drawback of hash compared to md5 is that it generates huge numbers, forcing you to waste five strokes with %1001 to trim the generated numbers into the required Roman Numeral range (11000).
In some cases, the Perl crypt builtin could be employed, though it is not wellsuited to this specific problem because (unlike md5) it generates many other characters in addition to the desired 09.
To recap, the shortest magic formulae found so far in all four languages (as detailed in this series of nodes) are:
10**(205558%ord(r)%7)%9995 # Python hash(r+"magicstrng")%1001 # Python (finding magicstrng is the subjec +t of this node!) VLD=~$_*5+IXCM=~$_."E@" # Perl 10**(7&5045e8/ord)%2857 # Perl (64bit) IXCMVLD=~$_;"1E@"%9995 # Perl XCMVLD=~$_;"1E@+"%9995 # Perl (update: Grimy improvement) uppp&md5_hex$_.PQcUv # Perl (needs Digest::MD5 module) 10**(494254%r/9)%4999 # Ruby (no need for explicit ord in this g +ame) md5($r.magicstrng) # PHP (finding magicstrng is an unsolved p +roblem) md5($r.PQcUv)&uppp # PHP wins due to superb mf properties of +md5
The 10**21 Problem
There's just no sane regularity in this thing. But in "random" mappings with a very small result set like this, the shortest solution is often to make up some magic formula that has no particular meaning, but just happens to give the wanted result.
 Ton Hospel explaining his original decimaltoroman magic formula
When Ton Hospel invented magic formulae in golf way back in 2004, he correctly noted that they work best "with a very small result set". Indeed, if there were only five Roman Numeral letters, rather than seven, we would have a straightforward 10**15 problem, rather than a (borderline intractable) 10**21 problem. Why 10**21?
Suppose you have a magic formula that produces a random number between 1 and 1000. The probability of scoring a (lucky) hit for one numeral is therefore 1 in 1000. Since probabilities multiply, the probability of hitting five out of five numerals is 1 in 10**15, while the chance of hitting all seven Roman Numerals is a daunting 1 in 10**21. Though 1 in 10**21 sounds improbable in the extreme, if you could generate 10**21 combinations, you would not need luck, indeed you would expect to find such an improbable solution. Yet how long would it take to search 10**21 combinations for the elusive (lucky) one?
Well, if you could perform 10,000 search operations per second, the time required to search 10**21 combinations is a mere 10**21/10000 seconds = 10**17 seconds = 3,170,979,198 years! By the way, this "brute force search infeasibility problem" is why you are asked to create longer and longer passwords, and with a wider range of characters in them, as computer speeds improve. Indeed, Password cracking and magic formula searching are closely related disciplines, the lack of a "codegolf magic formula" wikipedia page notwithstanding. :)
To make this theoretical argument more concrete, consider searching for a Roman to Decimal magic formula using the Python builtin hash function. Since this hash function produces very large values, we need to apply %1001 to it so as to produce an essentially random value in the desired 1..1000 range. We might try searching for such a magic formula using a simple Python bruteforce search program, such as:
On my machine, this naive brute force search for a tencharacter magic string is expected to find a solution in about 52,887,477 years! Note that, with 127 different values for each character in the string, we need a 10character magic string to give us the required 127**10 = 1.1e21 = 10**21 combinations.import time print time.time(), time.clock() for q0 in range(1, 128): for q1 in range(1, 128): for q2 in range(1, 128): for q3 in range(1, 128): for q4 in range(1, 128): for q5 in range(1, 128): for q6 in range(1, 128): for q7 in range(1, 128): for q8 in range(1, 128): for q9 in range(1, 128): magic = chr(q0)+chr(q1)+chr(q2)+chr(q3)+chr(q4)+chr(q5)+chr( +q6)+chr(q7)+chr(q8)+chr(q9) m = hash("M" + magic) % 1001 if m != 1000: continue d = hash("D" + magic) % 1001 if d != 500: continue c = hash("C" + magic) % 1001 if c != 100: continue l = hash("L" + magic) % 1001 if l != 50: continue x = hash("X" + magic) % 1001 if x != 10: continue v = hash("V" + magic) % 1001 if v != 5: continue i = hash("I" + magic) % 1001 if i != 1: continue print "bingo!", q0, q1, q2, q3, q4, q5, q6, q7, q8, q9 print time.time(), time.clock()
Rather than give up at this point, I found this "impossible" 50 million year challenge strangely irresistible ... and was eager to learn about any high performance computing techniques required to solve it.
I get it  optimization is a fun game ... one can play all day with unrolling loops, peeling away layers of indirection, and so forth to gain cycles, while piddling away time and energy
 davido
Or, as davido puts it, "optimization is a fun game". Well, I enjoy it. So I started by rewriting the above search program in C++, then kept on refining it until I was able to reduce the running time from fifty million years down to only several years and eventually find a solution. This new series of articles describes that endeavour.
Please note that this series focuses more on High Performance Computing/Supercomputing/Parallel computing in C/C++/Intel assembler than code golf.
First Attempt
Rewriting our crude Python searcher in C/C++ we get:
which ran about 190 times faster than the Python version. Down to a running time of 275,530 years now. Woohoo!#include <stdio.h> #include <time.h> // Python hash() function. Assumes 32bit int. // Derived from string_hash() in stringobject.c in Python 2.5.1 source +s. static int py_hash(const char* a, int size) { int len = size; const unsigned char* p = (unsigned char*)a; int x = *p << 7; while (len >= 0) x = (1000003 * x) ^ *p++; x ^= size; if (x == 1) x = 2; return x; } // Simulate python modulo operator. Assumes mod is positive. static int py_mod(int j, int mod) { int ret; if (j >= 0) { ret = j % mod; } else { ret = j  (j / mod) * mod; if (ret) ret += mod; } return ret; } int main() { int q0, q1, q2, q3, q4, q5, q6, q7, q8, q9; int m, d, c, l, x, v, i; char magic[16]; time_t tstart = time(NULL); clock_t cstart = clock(); time_t tend; clock_t cend; if (sizeof(int) != 4) { fprintf(stderr, "oops sizeof int != 4\n"); +return 1; } for (q0 = 1; q0 < 128; ++q0) { magic[1] = q0; for (q1 = 1; q1 < 128; ++q1) { magic[2] = q1; for (q2 = 1; q2 < 128; ++q2) { magic[3] = q2; for (q3 = 1; q3 < 128; ++q3) { magic[4] = q3; for (q4 = 1; q4 < 128; ++q4) { magic[5] = q4; for (q5 = 1; q5 < 128; ++q5) { magic[6] = q5; for (q6 = 1; q6 < 128; ++q6) { magic[7] = q6; for (q7 = 1; q7 < 128; ++q7) { magic[8] = q7; for (q8 = 1; q8 < 128; ++q8) { magic[9] = q8; for (q9 = 1; q9 < 128; ++q9) { magic[10] = q9; magic[0] = 'M'; m = py_mod(py_hash(magic, 11), 1001); if (m != 1000) continue; magic[0] = 'D'; d = py_mod(py_hash(magic, 11), 1001); if (d != 500) continue; magic[0] = 'C'; c = py_mod(py_hash(magic, 11), 1001); if (c != 100) continue; magic[0] = 'L'; l = py_mod(py_hash(magic, 11), 1001); if (l != 50) continue; magic[0] = 'X'; x = py_mod(py_hash(magic, 11), 1001); if (x != 10) continue; magic[0] = 'V'; v = py_mod(py_hash(magic, 11), 1001); if (v != 5) continue; magic[0] = 'I'; i = py_mod(py_hash(magic, 11), 1001); if (i != 1) continue; printf("bingo! %d %d %d %d %d %d %d %d %d %d\n", q0, q1, q2, q3, q4, q5, q6, q7, q8, q9); } } } } } } } } } } tend = time(NULL); cend = clock(); printf("(wall clock time:%ld secs, cpu time:%.2f units)\n", (long) (difftime(tend, tstart)+0.5), (double) (cendcstart) / (double)CLOCKS_PER_SEC); return 0; }
Unroll the hash and reorg the code
An obvious optimization is to "unroll" the py_hash() function within the loop; that is, incrementally build the hash value within the loop and remove the py_hash() function.
Note, by the way, that this algorithm is very wellsuited to multithreading; each thread can be given a different starting and ending range to search and just go at it; there is no need for any thread synchronization at all. So I took the opportunity at this stage to reorganize the code, to factor out the "inner loop" into its own file with a single function (with a C call interface) that just contains raw calculation, no global or static variables, no library calls, no I/O. Just code. This reorganization has a number of benefits:
 Easy to make the inner loop code threadsafe.
 Easy to call from multiple threads (having the "inner loop" function do no I/O simplifies this).
 Gives you the option to rewrite the "inner loop" file (in assembler, say) so long as you honor its interface.
 Easier to port. The code that launches threads will likely be different on each platform (e.g. POSIX vs Win32 threads), the I/O code may be different on each platform, while the inner loop code can be the same on all platforms (or tailored to suit).
 From now on, I can focus on the crucial inner loop code, and so have less boilerplate C++ code cluttering this meditation. :)
After the code reorganization and hash function unrolling, here is the new main program, find1.cpp:
#include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #define H_PRIME 1000003 // Most unlikely to get this many hits in a single call of the inner l +oop function. #define MAX_HITS 64 // my_m128_t mimics Intel __m128i type (which will be used later) typedef struct my_m128_t my_m128_t; struct my_m128_t { short m128i_i16[8]; }; // the inner loop function (lives in a separate file) extern "C" int inner( int startval, int endval, int m2, int d2, int c2, int l2, int x2, int v2, int i2, my_m128_t* ps ); void do_one(int q0, int sv1, int ev1, int sv2, int ev2, int sv3, int e +v3) { time_t tstart = time(NULL); clock_t cstart = clock(); time_t tend; clock_t cend; my_m128_t soln[MAX_HITS]; int m1, m2; int m_char = 'M'; int m0 = ( ( H_PRIME * (m_char << 7) ) ^ m_char ) * H_PRIME; int d1, d2; int d_char = 'D'; int d0 = ( ( H_PRIME * (d_char << 7) ) ^ d_char ) * H_PRIME; int c1, c2; int c_char = 'C'; int c0 = ( ( H_PRIME * (c_char << 7) ) ^ c_char ) * H_PRIME; int l1, l2; int l_char = 'L'; int l0 = ( ( H_PRIME * (l_char << 7) ) ^ l_char ) * H_PRIME; int x1, x2; int x_char = 'X'; int x0 = ( ( H_PRIME * (x_char << 7) ) ^ x_char ) * H_PRIME; int v1, v2; int v_char = 'V'; int v0 = ( ( H_PRIME * (v_char << 7) ) ^ v_char ) * H_PRIME; int i1, i2; int i_char = 'I'; int i0 = ( ( H_PRIME * (i_char << 7) ) ^ i_char ) * H_PRIME; int q1, q2; int mm0 = (m0 ^ q0) * H_PRIME; int dd0 = (d0 ^ q0) * H_PRIME; int cc0 = (c0 ^ q0) * H_PRIME; int ll0 = (l0 ^ q0) * H_PRIME; int xx0 = (x0 ^ q0) * H_PRIME; int vv0 = (v0 ^ q0) * H_PRIME; int ii0 = (i0 ^ q0) * H_PRIME; fprintf(stderr, "%d: sv1=%d ev1=%d sv2=%d ev2=%d sv3=%d ev3=%d:\n", q0, sv1, ev1, sv2, ev2, sv3, ev3); // We ignore 0, 10, 13 because these three require escaping in Pytho +n. for (q1 = sv1; q1 < ev1; ++q1) { if (q1 == 10  q1 == 13) continue; m1 = (mm0 ^ q1) * H_PRIME; d1 = (dd0 ^ q1) * H_PRIME; c1 = (cc0 ^ q1) * H_PRIME; l1 = (ll0 ^ q1) * H_PRIME; x1 = (xx0 ^ q1) * H_PRIME; v1 = (vv0 ^ q1) * H_PRIME; i1 = (ii0 ^ q1) * H_PRIME; for (q2 = sv2; q2 < ev2; ++q2) { if (q2 == 10  q2 == 13) continue; m2 = (m1 ^ q2) * H_PRIME; d2 = (d1 ^ q2) * H_PRIME; c2 = (c1 ^ q2) * H_PRIME; l2 = (l1 ^ q2) * H_PRIME; x2 = (x1 ^ q2) * H_PRIME; v2 = (v1 ^ q2) * H_PRIME; i2 = (i1 ^ q2) * H_PRIME; int isoln = inner(sv3, ev3, m2, d2, c2, l2, x2, v2, i2, soln); if (isoln > 0) { int k; for (k = 0; k < isoln; ++k) { fprintf(stderr, "%d %d %d %hd %hd %hd %hd %hd %hd %hd\n", q0, q1, q2, soln[k].m128i_i16[0], soln[k].m128i_i16[1], soln[k].m128i_i16[2], soln[k].m128i_i1 +6[3], soln[k].m128i_i16[4], soln[k].m128i_i16[5], soln[k].m128i_i1 +6[6]); } } } } tend = time(NULL); cend = clock(); fprintf(stderr, "(wall clock time:%ld secs, cpu time:%.2f units)\n" +, (long) (difftime(tend, tstart)+0.5), (double) (cendcstart) / (double)CLOCKS_PER_SEC); } int main(int argc, char* argv[]) { int sv0, sv1, sv2, sv3, ev1, ev2, ev3; if (argc != 8) { fprintf(stderr, "usage: prog sv0 sv1 ev1 sv2 ev2 s +v3 ev3\n"); return 1; } sv0 = atoi(argv[1]); sv1 = atoi(argv[2]); ev1 = atoi(argv[3]); sv2 = atoi(argv[4]); ev2 = atoi(argv[5]); sv3 = atoi(argv[6]); ev3 = atoi(argv[7]); do_one(sv0, sv1, ev1, sv2, ev2, sv3, ev3); return 0; }
And here is the "inner loop" function, in its own file, inner1.c:
#define H_PRIME 1000003 #define HASH_LEN 11 // my_m128_t mimics Intel __m128i type typedef struct my_m128_t my_m128_t; struct my_m128_t { short m128i_i16[8]; }; // Simulate python modulo operator. Assumes mod is positive. static int py_mod(int j, int mod) { int ret; if (j >= 0) { ret = j % mod; } else { ret = j  (j / mod) * mod; if (ret) ret += mod; } return ret; } // Apart from: // 1 % 1001 == 1000 // 2 % 1002 == 1000 // both 1 and 2 can never match any of 1000, 500, ..., 1 // Thus don't worry about the Python hash conversion from 1 to 2, ex +cept for M int inner( int startval, int endval, int m2, int d2, int c2, int l2, int x2, int v2, int i2, my_m128_t* ps) { int m, d, c, l, x, v, i; int m3, m4, m5, m6, m7, m8, m9; int d3, d4, d5, d6, d7, d8, d9; int c3, c4, c5, c6, c7, c8, c9; int l3, l4, l5, l6, l7, l8, l9; int x3, x4, x5, x6, x7, x8, x9; int v3, v4, v5, v6, v7, v8, v9; int i3, i4, i5, i6, i7, i8, i9; int q3, q4, q5, q6, q7, q8, q9; int iret = 0; for (q3 = startval; q3 < endval; ++q3) { if (q3 == 10  q3 == 13) continue; m3 = (m2 ^ q3) * H_PRIME; d3 = (d2 ^ q3) * H_PRIME; c3 = (c2 ^ q3) * H_PRIME; l3 = (l2 ^ q3) * H_PRIME; x3 = (x2 ^ q3) * H_PRIME; v3 = (v2 ^ q3) * H_PRIME; i3 = (i2 ^ q3) * H_PRIME; for (q4 = 1; q4 < 128; ++q4) { if (q4 == 10  q4 == 13) continue; m4 = (m3 ^ q4) * H_PRIME; d4 = (d3 ^ q4) * H_PRIME; c4 = (c3 ^ q4) * H_PRIME; l4 = (l3 ^ q4) * H_PRIME; x4 = (x3 ^ q4) * H_PRIME; v4 = (v3 ^ q4) * H_PRIME; i4 = (i3 ^ q4) * H_PRIME; for (q5 = 1; q5 < 128; ++q5) { if (q5 == 10  q5 == 13) continue; m5 = (m4 ^ q5) * H_PRIME; d5 = (d4 ^ q5) * H_PRIME; c5 = (c4 ^ q5) * H_PRIME; l5 = (l4 ^ q5) * H_PRIME; x5 = (x4 ^ q5) * H_PRIME; v5 = (v4 ^ q5) * H_PRIME; i5 = (i4 ^ q5) * H_PRIME; for (q6 = 1; q6 < 128; ++q6) { if (q6 == 10  q6 == 13) continue; m6 = (m5 ^ q6) * H_PRIME; d6 = (d5 ^ q6) * H_PRIME; c6 = (c5 ^ q6) * H_PRIME; l6 = (l5 ^ q6) * H_PRIME; x6 = (x5 ^ q6) * H_PRIME; v6 = (v5 ^ q6) * H_PRIME; i6 = (i5 ^ q6) * H_PRIME; for (q7 = 1; q7 < 128; ++q7) { if (q7 == 10  q7 == 13) continue; m7 = (m6 ^ q7) * H_PRIME; d7 = (d6 ^ q7) * H_PRIME; c7 = (c6 ^ q7) * H_PRIME; l7 = (l6 ^ q7) * H_PRIME; x7 = (x6 ^ q7) * H_PRIME; v7 = (v6 ^ q7) * H_PRIME; i7 = (i6 ^ q7) * H_PRIME; for (q8 = 1; q8 < 128; ++q8) { if (q8 == 10  q8 == 13) continue; m8 = (m7 ^ q8) * H_PRIME; d8 = (d7 ^ q8) * H_PRIME; c8 = (c7 ^ q8) * H_PRIME; l8 = (l7 ^ q8) * H_PRIME; x8 = (x7 ^ q8) * H_PRIME; v8 = (v7 ^ q8) * H_PRIME; i8 = (i7 ^ q8) * H_PRIME; for (q9 = 1; q9 < 128; ++q9) { if (q9 == 10  q9 == 13) continue; m9 = (m8 ^ q9) ^ HASH_LEN; if (m9 == 1) m9 = 2; m = py_mod(m9, 1001); if (m != 1000) continue; d9 = (d8 ^ q9) ^ HASH_LEN; d = py_mod(d9, 1001); if (d != 500) continue; c9 = (c8 ^ q9) ^ HASH_LEN; c = py_mod(c9, 1001); if (c != 100) continue; l9 = (l8 ^ q9) ^ HASH_LEN; l = py_mod(l9, 1001); if (l != 50) continue; x9 = (x8 ^ q9) ^ HASH_LEN; x = py_mod(x9, 1001); if (x != 10) continue; v9 = (v8 ^ q9) ^ HASH_LEN; v = py_mod(v9, 1001); if (v != 5) continue; i9 = (i8 ^ q9) ^ HASH_LEN; i = py_mod(i9, 1001); if (i != 1) continue; ps[iret].m128i_i16[0] = q3; ps[iret].m128i_i16[1] = q4; ps[iret].m128i_i16[2] = q5; ps[iret].m128i_i16[3] = q6; ps[iret].m128i_i16[4] = q7; ps[iret].m128i_i16[5] = q8; ps[iret].m128i_i16[6] = q9; ++iret; } } } } } } } return iret; }
This is incredible. You must be impossibly stubborn to go through such a tedious work. :)
 wazoox
Yes wazoox, that was tedious. Yet the simple, if tedious, unrolling of the py_hash function sped up the code by a factor of four. Down to an estimated running time of 62,860 years now. Still a lot more work to do before we can break out the champagne.
Tactical vs Strategic Optimizations
Rule 1: Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is.
Rule 2: Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
 Rob Pike
When in doubt, use brute force.
 Ken Thompson
There are many microoptimizations we could apply to speed up this inner loop. For example, we shall see later how to apply the Intel AVX instructions to vectorize this code:
Though AVX does indeed give a minor speed boost  as do many other tweaks yet to be discussed  to achieve an order of magnitude performance boost we need to think strategically, look at the bigger picture. What overall strategies should we apply to speed up this searcher? Specifically, can we eliminate the innermost loop somehow? How? Find out in the next exciting episode.m3 = (m2 ^ q3) * H_PRIME; d3 = (d2 ^ q3) * H_PRIME; c3 = (c2 ^ q3) * H_PRIME; l3 = (l2 ^ q3) * H_PRIME; x3 = (x2 ^ q3) * H_PRIME; v3 = (v2 ^ q3) * H_PRIME; i3 = (i2 ^ q3) * H_PRIME;
References
 The 10**21 Problem (Part 2)
 The 10**21 Problem (Part 3)
 The 10**21 Problem (Part 4)
 Golf: Magic Formula for Roman Numerals
 The golf course looks great, my swing feels good, I like my chances (Part I)
Updated 1 May 2014: Updated find1.cpp main() to take commandline arguments specifying a range of values to search for. Updated 31 May 2014: added Grimy's one stroke Perl improvement plus a Perl version of PHP md5 magic formula.


Replies are listed 'Best First'.  

Re: The 10**21 Problem (Part I)
by ambrus (Abbot) on Apr 24, 2014 at 18:24 UTC  
Re: The 10**21 Problem (Part I)
by davido (Cardinal) on Apr 21, 2014 at 19:54 UTC  
Re: The 10**21 Problem (Part I)
by oiskuu (Hermit) on May 05, 2014 at 20:08 UTC  
Re: The 10**21 Problem (Part I)
by Grimy (Pilgrim) on May 30, 2014 at 08:05 UTC  
by eyepopslikeamosquito (Chancellor) on May 30, 2014 at 23:49 UTC 