Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

The golf course looks great, my swing feels good, I like my chances (Part III)

by eyepopslikeamosquito (Canon)
on May 06, 2009 at 07:13 UTC ( #762180=perlmeditation: print w/ replies, xml ) Need Help??

Another generally applicable golfing tip is to study every single built-in function the language has to offer

-- From Part II of this series

Number of PHP functions in the main namespace: 3079

-- Cited at PHP in contrast to Perl

Unlike Ruby and Python, I knew nothing of PHP when this game began. Nothing. Zilch. Nada. And trawling through over 3000 verbose, and often appallingly inconsistent, built-in functions did not make a good first impression. More on this later, but, as a language, PHP is streets behind Perl, Ruby and Python IMHO.

Allow me to state this more plainly: if I had to program in PHP for a living, I would poke my eyes out with a fork. And yet I very much enjoyed playing golf in PHP! Why? Golf, at least for me, is essentially a competition, a highly artificial and stylized skill, more akin to chess than to the craft of writing useful, robust and maintainable computer programs.

Having already golfed this problem in three languages by now, my confidence was rising. I felt surprisingly calm about being a complete PHP ignoramus because I had formed the opinion that far more important than knowing the language is understanding the problem and its algorithms.

Taking the Early Lead in PHP

Tuning my magic formula searcher for PHP brought instant rewards in the form of this straightforward 87 stroker:

<?while($c=ord(fgetc(STDIN)))$t+=$n-2*$n*(+$n<($n=2627%$c/8%8 .E. 7124 +8%$c%5));echo$t?>
This early submission stole the PHP lead from the (presumably Japanese) golfer Hiro Suzuki by two strokes. I was so thrilled at finding my first PHP golfing trick of shortening the leading <?php to <? that I totally missed the routine golfing trick of shortening the trailing ?> to semicolon (;), Eugene's advice of "Can't possibly work, try it anyway" having been temporarily forgotten.

Also of note in this solution is the same exploitation of virtual machine evaluation order that I had used previously in Perl and Ruby, to eliminate the "previous value" variable. To get this to work, I had to use +$n rather than a bald $n. Curiously, this behaviour varied between PHP versions, the + being required only with later PHP versions.

Notice that PHP, like Ruby, but unlike Perl and Python, requires you to deal somehow with the trailing newline annoyance, making magic formulae harder to find, and therefore likely to be longer. So it proved here with the two magic formulae above being a touch longer than the early ones I had found for the other languages. And I found the spaces around the .E. intensely irritating -- though, like Perl, but unlike Ruby and Python, mercifully, I didn't need to quote the E.

Breaking the 80 Barrier

Though delighted to be leading the PHP pack, I was horrified by the length of those two magic formulae. What to do?

It's as easy as 1, 2, 3

It occurred to me that in this PHP solution I could replace those two ugly magic formulae with three prettier ones by fiddling with the expression controlling the while loop like so:

<?while($c=1230%ord(fgetc(STDIN)))$t+=$n-2*$n*(+$n<($n=(41%$c&5).E.$c% +9%4));echo$t?>
Though this required a complete rewrite of my magic formula searcher, it was worth it because it shaved three strokes. Note that, because the while loop expression now eliminates the trailing newline (newline has an ord of 10, so 1230%ord evaluates to zero and terminates the loop), I no longer need to map the newline in the two magic formulae in the loop body. Note too that the intolerable spaces around the E have been eliminated. Yay! Down to 84 and leading by five now. Despite the overall ugliness of the language, unearthing these tactical tricks was making PHP golfing lots of fun.

Applying some further tricks picked up golfing the problem in the other languages, combined with improved magic formulae, and finally remembering good ol' Eugene to find the trailing ; hack, enabled me to whittle this approach all the way down to 75 strokes:

<?while($c=68040%ord(fgetc(STDIN))/2)$t+=$n-2*$n%+$n=9385%$c.E.$c/8?>< +?=$t;
Variety being the spice of life, notice that I also replaced echo$t; with <?=$t; ... though this did not change the golf score one iota. Leading by 14 now. Where to go from here?

The Magic of a Built-in md5 Function

All my moaning about the inelegance of 3000+ built-in functions ceased the instant I stumbled upon the built-in md5 and sha1 functions. These functions were perfect for magic formulae! Why had I not spotted them sooner?

The md5 function, you see, enjoys a fundamental advantage over ord in that you can use all 256 characters in a magic formula, compared to just ten (0-9) for ord (for example, 68040%ord in the magic formula above). To illustrate, in a six character string, there are just 10**6 combinations available for ord, while there are 256**6=2.8*10**14 combinations available for md5.

There is a catch though. To enjoy all 256 characters you must quote the string, which costs you two strokes. Using Eugene's "can't possibly work, try it anyway" approach, however, revealed that PHP seems to treat characters in the ord range 127-255 as "alphabetic". Anyway, you don't need to quote them, making md5 a certain winner over ord in any magic formula race.

To put all this to the proof, I whittled my 75 stroker by three strokes by replacing ord with md5. Here are some of them:

<?while(+$c=md5(XXX.fgetc(STDIN)))$t+=$n-2*$n%+$n=5%$c[2].E.$c%4?><?=$ +t; <?while(+$c=md5(XXXXX.fgetc(STDIN)))$t+=$n-2*$n%$n="2E$c"*1%1999?><?=$ +t; <?while(8^$c=md5(fgetc(STDIN).XXXX))$t+=$n-2*$n%$n="5E$c"*1%4999?><?=$ +t; <?for(;8^$c=md5(fgetc(STDIN).XXXX);$t+=$n-2*$n%$n="5E$c"*1%4999);echo$ +t;
where XXX above is chr(115).chr(205).chr(69), XXXXX is chr(225).chr(246).chr(180).chr(162).chr(188), and XXXX is chr(174).chr(110).chr(204).chr(142).

The observant reader will have noticed the comical "multiply by one" above, as in "5E$c"*1%4999. Why on Earth would I waste two strokes like that? Well, my testing revealed that PHP interprets "5E2" as 500 (i.e. scientific notation) if followed by a multiplies * operator, but as 5 if followed by a modulo % operator! AFAICT, this surprising behaviour is undocumented, an accident of implementation. In case you're interested, Perl consistently interprets "5E2" as 500, whether followed by multiplies * or modulo %. Ruby always interprets "5E2".to_i() as 5 (integer) and "5E2".to_f() as 500 (floating point). Python, as usual, is the strictest of the gang of four languages, always interpreting float("5E2") as 500 (floating point), yet emitting an "invalid literal for int()" runtime error for int("5E2").

Though I swore at PHP's eccentricity here, loud claps of applause could be heard, as described in the next section, when I desperately needed PHP to interpret the "04e9d..." md5 string as four, and not as 4,000,000,000.

Down to 72 strokes now and leading by 17. To go lower, I needed to eliminate the "E" exponentiation. But how?

Eliminating Exponentiation

Alone among the four languages, PHP lacks a ** exponentiation operator. This had proved to be only a minor nuisance so far, forcing me to rely instead on .E. like constructs. Yet with the PHP md5 function, and again alone among the four languages, I finally spied an opportunity to get rid of the dratted stroke-consuming exponentiation. All I needed to do was find a lucky md5 direct mapping of M -> 1000, D -> 500, C -> 100, L -> 50, X -> 10, V -> 5, I -> 1. Alas, back of the envelope calculations indicated that this mapping was infeasible, even with md5. To find such a mapping, I estimated the search program would need to run for hundreds of years!

What to do? Well, remembering the M999D499C99L49X9V4I string used in some of my early Perl table lookup solutions, the M -> 999, D -> 499, C -> 99, L -> 49, X -> 9, V -> 4, I -> 0 mapping looked much more promising because the C and X mappings are reduced by one character in length, plus any of abcdef will now produce the required I -> 0 mapping. The running time of this new and improved mapping was estimated to be less than five years. Getting closer now. Just need to speed up the searcher a bit.

Lacking a super computer, I resorted to plugging an assembly language md5 routine from OpenSSL into my C magic formula searcher and let 'er rip. After dutifully chugging away for six months, the searcher finally unearthed a lucky hit, illustrated by the following test program:

<? $m = chr(111).chr(178).chr(219).chr(246).chr(172).chr(209); echo "M " . md5($m."M") . "\n"; echo "D " . md5($m."D") . "\n"; echo "C " . md5($m."C") . "\n"; echo "L " . md5($m."L") . "\n"; echo "X " . md5($m."X") . "\n"; echo "V " . md5($m."V") . "\n"; echo "I " . md5($m."I") . "\n"; ?>
Running the above test program produces:
M 999ba98d67a9a95e1c8a693fe4db176a D 453851b70cb208b0a6ddee5ad7c96be6 C 99d5a8afc39949541af0f86e8e5ea467 L 49abfffcd140d32e8210986a0336ba39 X 9ea85b3084eb5939c9f74677aa590322 V 04e9dd3542beb93d04d6d225620a814e I d445ba785ca91c9a00b361817abafac5
Notice that applying %1858 to 453851 produces the required 499 D mapping, while leaving the other mappings untouched, thus producing the desired mapping M -> 999, D -> 499, C -> 99, L -> 49, X -> 9, V -> 4, I -> 0, and a corresponding 70 stroke solution:
<?while(A<$c=fgetc(STDIN))$t+=$n-2*$n%$n=md5(XXXXXX.$c)%1858+1?><?=$t;
where XXXXXX above is chr(111).chr(178).chr(219).chr(246).chr(172).chr(209). Success at last!

Proving That a Shorter Solution Exists

Beware of bugs in the above code; I have only proved it correct, not tried it.

-- Donald Knuth

Though I can't currently demonstrate a shorter PHP solution, I can easily prove that one exists. First, note that there are about 180 unquoted characters that may be combined to form input to the md5 function, namely ord 127-255, a-z, A-Z, _, and 0-9. Now, suppose you could find an exact hit, making the %1858 above unnecessary. That would save five strokes. To have a good chance of finding such a lucky hit, you'd need to unearth around 10000 solutions like the one found in six months above; with that many solutions, there's an excellent chance of one of them happening to have the required 499 D mapping (rather than 453851 as above). The ballpark figure of 10000 is got by estimating the probability of a random md5 signature starting with 499[a-f], which is (1/16)*(1/16)*(1/16)*(6/16)=1/10923. Now, extending the search space from six characters in length to eight increases the number of possible solutions by about 180*180=32400, which should be more than enough solutions to produce one lucky hit. That is, increasing the magic formula string by two strokes saves five. QED.

To have a reasonable chance of finding such a solution with my current search program, however, requires a running time of six months times 10000 = 5000 years! Not even my new low cholesterol diet will allow me to live that long. Because the search is highly parallelizable, however, a cheap super computer may soon be within reach of solving this problem. With the recent development of CUDA and OpenCL, a humble PC containing six or so high-end NVIDIA graphics cards may well be able to solve it today. To quote Ton Hospel: it has to be tried, at least.

I'm exhausted now after writing all this down, and it's long enough already, so I'll sign off for now. In the final installment, I'll give my opinions on how the four languages stacked up against each other in this challenge and provide some statistics on how the languages have compared to each other at the codegolf web site, in terms both of code length and popularity.

References

Updated 7-may: Minor clarifications to probabilities in "Proving That a Shorter Solution Exists" section.

Comment on The golf course looks great, my swing feels good, I like my chances (Part III)
Select or Download Code
Re: The golf course looks great, my swing feels good, I like my chances (Part III)
by Roy Johnson (Monsignor) on May 06, 2009 at 18:00 UTC
    After dutifully chugging away for six months
    This is what I'm taking with me from this post. Wow.

    Caution: Contents may have been coded under pressure.
      Wow, no joke. That's quite the dedication to getting some sweet golf code, kudos for recognizing it in the first place!! ++

      Glad you enjoyed it! I found the whole thing surreal/hilarious while it was running, especially when I thought about how I'd go if this game had a one week time limit like the good ol' traditional Perl golf games.

      It didn't run continuously for six months though. The search program could be run in pieces, as indicated by the C source code below, and running four copies of it simultaneously on a quad core machine helped (as I said, the required search is fundamentally highly parallelizable). Note that this C code handles the md5("M".magic-string) case. Small changes allow it to also be used for the md5(magic-string."M") case. Any suggestions for performance improvements to this C code are very welcome.

      Oh, and after running this searcher and saving its stdout, I ran another small PHP program to confirm C/PHP md5 compatibility and to search for possible hits by applying the mod operator.

      /* findmin6k.c For asm version (copy m5_win32.obj from openssl crypto/md5/asm direc +tory) and build with: cl /W3 /O2 findmin6k.c m5_win32.obj (windows) gcc -Wall -O3 -o findmin6k findmin6k.c mx86-elf.o (32-bit Linux +) gcc -Wall -O3 -o findmin6k findmin6k.c md5-x86_64.o (64-bit Linux +) Will likely need a C version to work with NVIDIA graphics card CUDA. For a pure C version, just write a C version of md5_block_asm_data_o +rder() and link with that. For example: cl /W3 /O2 findmin6k.c m5_ssl_le.c cl /W3 /O2 findmin6k.c m5_nsa_le.c For now, assume 32-bit int and little endian (_le). Example run on Unix: nohup nice ./findmin6k 65 66 48 160 >6k-65-1.tmp 2>err1.tmp & nohup nice ./findmin6k 65 66 160 256 >6k-65-2.tmp 2>err2.tmp & */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> static void test_endian() { int i; char sStrTmp[64]; int iIntSize = (int)sizeof(int); int iLongSize = (int)sizeof(long); union { unsigned long l; char c[sizeof(long)]; } u; printf("sizeof(int)=%d\nsizeof(long)=%d\nsizeof(size_t)=%d\n", iIntSize, iLongSize, (int)sizeof(size_t)); if (iLongSize > 4) { u.l = (0x08070605L << 32) | 0x04030201L; } else { u.l = 0x04030201L; } memset(sStrTmp, 0, sizeof(sStrTmp)); for (i = 0; i < iLongSize; ++i) { sprintf(sStrTmp+i, "%c", u.c[i]+'0'); } printf("byteorder=%s (1234=little-endian, 4321=big-endian)\n", sStr +Tmp); } /* Uncomment next line to use openssl asm version of md5_block_asm_dat +a_order() */ #define MY_ASM 1 #define MD5_LONG unsigned int typedef struct MD5state_st { MD5_LONG A,B,C,D; } MD5_CTX; #ifdef __cplusplus extern "C" { #endif #ifdef MY_ASM void md5_block_asm_data_order(MD5_CTX*, const void*, size_t); #else void md5_block_asm_data_order(MD5_CTX*, const void*); #endif #ifdef __cplusplus } #endif /* -------------------------------------------------------------- */ #define HI_SENTINEL 1000000000 static int my_md5(const unsigned char* inbuf) { int outi = 0; int nib; MD5_CTX mdContext; #ifdef MY_ASM mdContext.A = 0x67452301; mdContext.B = 0xefcdab89; mdContext.C = 0x98badcfe; mdContext.D = 0x10325476; md5_block_asm_data_order(&mdContext, inbuf, 1); #else md5_block_asm_data_order(&mdContext, inbuf); #endif /* 1st byte */ nib = (unsigned char)((mdContext.A >> 4) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)(mdContext.A & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 2nd byte */ nib = (unsigned char)((mdContext.A >> 12) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.A >> 8) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 3rd byte */ nib = (unsigned char)((mdContext.A >> 20) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.A >> 16) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 4th byte */ nib = (unsigned char)((mdContext.A >> 28) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.A >> 24) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; /* 5th byte */ nib = (unsigned char)((mdContext.B >> 4) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; nib = (unsigned char)(mdContext.B & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; /* 6th byte */ nib = (unsigned char)((mdContext.B >> 12) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.B >> 8) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; /* 7th byte */ nib = (unsigned char)((mdContext.B >> 20) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.B >> 16) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; /* 8th byte */ nib = (unsigned char)((mdContext.B >> 28) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.B >> 24) & 0xF); if (nib > 9) return outi; if (outi > 9999999) return HI_SENTINEL; outi = outi * 10 + nib; #if 0 /* XXX: very unlikely to matter (only diabolicals like 0000000000000 +00999a) */ /* 9th byte */ nib = (unsigned char)((mdContext.C >> 4) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)(mdContext.C & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 10th byte */ nib = (unsigned char)((mdContext.C >> 12) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.C >> 8) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 11th byte */ nib = (unsigned char)((mdContext.C >> 20) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.C >> 16) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 12th byte */ nib = (unsigned char)((mdContext.C >> 28) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.C >> 24) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 13th byte */ nib = (unsigned char)((mdContext.D >> 4) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)(mdContext.D & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 14th byte */ nib = (unsigned char)((mdContext.D >> 12) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.D >> 8) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 15th byte */ nib = (unsigned char)((mdContext.D >> 20) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.D >> 16) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; /* 16th byte */ nib = (unsigned char)((mdContext.D >> 28) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; nib = (unsigned char)((mdContext.D >> 24) & 0xF); if (nib > 9) return outi; outi = outi * 10 + nib; #endif return outi; } #define M_TARG 999 #define D_TARG 499 #define C_TARG 99 #define L_TARG 49 #define X_TARG 9 #define V_TARG 4 #define I_TARG 0 #define SEARCH_WIDTH 6 #define H_LEN (SEARCH_WIDTH+1) #define MD5_CBLOCK 64 static void do_one(int start_val, int end_val, int start_val_2, int en +d_val_2) { unsigned char m_buf[MD5_CBLOCK]; unsigned char d_buf[MD5_CBLOCK]; unsigned char c_buf[MD5_CBLOCK]; unsigned char l_buf[MD5_CBLOCK]; unsigned char x_buf[MD5_CBLOCK]; unsigned char v_buf[MD5_CBLOCK]; unsigned char i_buf[MD5_CBLOCK]; unsigned char n_buf[MD5_CBLOCK]; int nmiss = 0; int nsent = 0; int q1 = 0; int q2 = 0; int q3 = 0; int q4 = 0; int q5 = 0; int q6 = 0; int m_char = 'M'; int d_char = 'D'; int c_char = 'C'; int l_char = 'L'; int x_char = 'X'; int v_char = 'V'; int i_char = 'I'; int n_char = 10; int m5 = 0; int d5 = 0; int c5 = 0; int l5 = 0; int x5 = 0; int v5 = 0; int i5 = 0; int n5 = 0; time_t tstart = time(NULL); clock_t cstart = clock(); time_t tend; clock_t cend; memset(m_buf, 0, MD5_CBLOCK); memset(d_buf, 0, MD5_CBLOCK); memset(c_buf, 0, MD5_CBLOCK); memset(l_buf, 0, MD5_CBLOCK); memset(x_buf, 0, MD5_CBLOCK); memset(v_buf, 0, MD5_CBLOCK); memset(i_buf, 0, MD5_CBLOCK); memset(n_buf, 0, MD5_CBLOCK); m_buf[H_LEN] = 0x80; d_buf[H_LEN] = 0x80; c_buf[H_LEN] = 0x80; l_buf[H_LEN] = 0x80; x_buf[H_LEN] = 0x80; v_buf[H_LEN] = 0x80; i_buf[H_LEN] = 0x80; n_buf[H_LEN] = 0x80; m_buf[MD5_CBLOCK-8] = H_LEN * 8; d_buf[MD5_CBLOCK-8] = H_LEN * 8; c_buf[MD5_CBLOCK-8] = H_LEN * 8; l_buf[MD5_CBLOCK-8] = H_LEN * 8; x_buf[MD5_CBLOCK-8] = H_LEN * 8; v_buf[MD5_CBLOCK-8] = H_LEN * 8; i_buf[MD5_CBLOCK-8] = H_LEN * 8; n_buf[MD5_CBLOCK-8] = H_LEN * 8; m_buf[0] = m_char; d_buf[0] = d_char; c_buf[0] = c_char; l_buf[0] = l_char; x_buf[0] = x_char; v_buf[0] = v_char; i_buf[0] = i_char; n_buf[0] = n_char; for (q1 = start_val; q1 < end_val; ++q1) { if (q1==91 || q1==92 || q1==93 || q1==94 || q1==96 || q1==123 || q1==124 || q1==125 || q1==126) continue; m_buf[1] = q1; d_buf[1] = q1; c_buf[1] = q1; l_buf[1] = q1; x_buf[1] = q1; v_buf[1] = q1; i_buf[1] = q1; n_buf[1] = q1; for (q2 = start_val_2; q2 < end_val_2; ++q2) { if (q2==91 || q2==92 || q2==93 || q2==94 || q2==96 || q2==123 || q2==124 || q2==125 || q2==126 || q2==58 || q2==59 || q2==60 || q2==61 || q2==62 || q2==63 + || q2==64) continue; m_buf[2] = q2; d_buf[2] = q2; c_buf[2] = q2; l_buf[2] = q2; x_buf[2] = q2; v_buf[2] = q2; i_buf[2] = q2; n_buf[2] = q2; fprintf(stderr, "%d %d\n", q1, q2); for (q3 = 48; q3 < 256; ++q3) { if (q3==91 || q3==92 || q3==93 || q3==94 || q3==96 || q3==123 || q3==124 || q3==125 || q3==126 || q3==58 || q3==59 || q3==60 || q3==61 || q3==62 || q3== +63 || q3==64) continue; m_buf[3] = q3; d_buf[3] = q3; c_buf[3] = q3; l_buf[3] = q3; x_buf[3] = q3; v_buf[3] = q3; i_buf[3] = q3; n_buf[3] = q3; for (q4 = 48; q4 < 256; ++q4) { if (q4==91 || q4==92 || q4==93 || q4==94 || q4==96 || q4==123 || q4==124 || q4==125 || q4==126 || q4==58 || q4==59 || q4==60 || q4==61 || q4==62 || q4 +==63 || q4==64) continue; m_buf[4] = q4; d_buf[4] = q4; c_buf[4] = q4; l_buf[4] = q4; x_buf[4] = q4; v_buf[4] = q4; i_buf[4] = q4; n_buf[4] = q4; for (q5 = 48; q5 < 256; ++q5) { if (q5==91 || q5==92 || q5==93 || q5==94 || q5==96 || q5==123 || q5==124 || q5==125 || q5==126 || q5==58 || q5==59 || q5==60 || q5==61 || q5==62 || +q5==63 || q5==64) continue; m_buf[5] = q5; d_buf[5] = q5; c_buf[5] = q5; l_buf[5] = q5; x_buf[5] = q5; v_buf[5] = q5; i_buf[5] = q5; n_buf[5] = q5; for (q6 = 48; q6 < 256; ++q6) { if (q6==91 || q6==92 || q6==93 || q6==94 || q6==96 || q6==123 || q6==124 || q6==125 || q6==126 || q6==58 || q6==59 || q6==60 || q6==61 || q6==62 | +| q6==63 || q6==64) continue; m_buf[6] = q6; d_buf[6] = q6; c_buf[6] = q6; l_buf[6] = q6; x_buf[6] = q6; v_buf[6] = q6; i_buf[6] = q6; n_buf[6] = q6; nmiss = 0; m5 = my_md5(m_buf); if (m5 == 0) continue; if (m5 != M_TARG) { if (m5 <= M_TARG+M_TARG) continue; ++nmiss; } d5 = my_md5(d_buf); if (d5 == 0) continue; if (d5 != D_TARG) { if (d5 <= M_TARG+D_TARG) continue; ++nmiss; } c5 = my_md5(c_buf); if (c5 == 0) continue; if (c5 != C_TARG) { if (c5 <= M_TARG+C_TARG) continue; ++nmiss; } if (nmiss > 2) continue; l5 = my_md5(l_buf); if (l5 == 0) continue; if (l5 != L_TARG) { if (l5 <= M_TARG+L_TARG) continue; ++nmiss; } if (nmiss > 2) continue; x5 = my_md5(x_buf); if (x5 != X_TARG) { if (x5 <= M_TARG+X_TARG) continue; ++nmiss; } if (nmiss > 2) continue; v5 = my_md5(v_buf); if (v5 != V_TARG) { if (v5 <= M_TARG+V_TARG) continue; ++nmiss; } if (nmiss > 2) continue; i5 = my_md5(i_buf); if (i5 != I_TARG) { if (i5 <= M_TARG+I_TARG) continue; ++nmiss; } if (nmiss > 2) continue; n5 = my_md5(n_buf); nsent = 0; if (m5 == HI_SENTINEL) { m5 += 1000; ++nsent; } if (d5 == HI_SENTINEL) { d5 += 500; ++nsent; } if (c5 == HI_SENTINEL) { c5 += 100; ++nsent; } if (l5 == HI_SENTINEL) { l5 += 50; ++nsent; } if (x5 == HI_SENTINEL) { x5 += 10; ++nsent; } if (v5 == HI_SENTINEL) { v5 += 5; ++nsent; } if (i5 == HI_SENTINEL) { i5 += 1; ++nsent; } if (n5 == HI_SENTINEL) { ++nsent; } printf("N %d nsent=%d: %d %d %d %d %d %d: %d %d %d %d %d + %d %d %d\n", nmiss, nsent, q1, q2, q3, q4, q5, q6, m5, d5, c5, l5, +x5, v5, i5, n5); fflush(stdout); if (m5==d5 || m5==c5 || m5==l5 || m5==x5 || m5==v5 || m5 +==i5) continue; if (d5==c5 || d5==l5 || d5==x5 || d5==v5 || d5==i5) cont +inue; if (c5==l5 || c5==x5 || c5==v5 || c5==i5) continue; if (l5==x5 || l5==v5 || l5==i5) continue; if (x5==v5 || x5==i5) continue; if (v5==i5) continue; printf("N %d: %d %d %d %d %d %d; %d %d %d %d %d %d %d %d +\n", nmiss, q1, q2, q3, q4, q5, q6, m5, d5, c5, l5, x5, v5, + i5, n5); fflush(stdout); } } } } } } tend = time(NULL); cend = clock(); printf("(wall clock time:%ld secs, cpu time:%.2f units)\n", (long) (difftime(tend, tstart)+0.5), (double) (cend-cstart) / (double)CLOCKS_PER_SEC); } int main(int argc, char* argv[]) { int start_val = 65; int end_val = 256; int start_val_2 = 48; int end_val_2 = 256; if (argc > 2) { start_val = atoi(argv[1]); end_val = atoi(argv[2]); if (argc == 5) { start_val_2 = atoi(argv[3]); end_val_2 = atoi(argv[4]); } } printf("start: %d %d, %d %d\n", start_val, end_val, start_val_2, end +_val_2); test_endian(); fflush(stdout); do_one(start_val, end_val, start_val_2, end_val_2); return 0; }

      Ditto! When I looked for magic formulae, etc., I would give up after 20 minutes!
Re: The golf course looks great, my swing feels good, I like my chances (Part III)
by wazoox (Prior) on May 06, 2009 at 21:54 UTC
    This is incredible. You must be impossibly stubborn to go through such a tedious work :) Kudos, master.

      Noone has described me as stubborn before. :-) I think in golf, you can't afford to be too stubborn -- you must change tack when an approach is obviously not working. Actually, I didn't find writing the search program tedious because I love programming and was delighted to learn more about the md5 algorithm, which this game forced me to do. Running it wasn't especially tedious either, the whole process being semi-automated. What I did find strange was that I got into the habit of checking the searcher results every morning in much the same way as problem gamblers check their lotto tickets at our local newsagency. Just like them, most mornings I was pissed off because I didn't win, i.e. no md5 hits were found. And when a lucky hit was finally found, I guess I celebrated just like a gambling addict who just had a win.

      I think people who know me would describe me as competitive, pedantic, a bit OCD, and with a tendency to get addicted to things, such as, er, golf. Oh, and I love a difficult/"impossible" challenge, which is why I enjoyed the md5 challenge more than any other in this game because you know a solution is there yet it is so hard to find it (and I still haven't found it).

      Traditionally, the three virtues of a Perl programmer are: laziness, impatience, and hubris. I'd say hubris definitely applies to golfers, but not the other two. As for the three virtues of a golfer, I suggest:

      • Competitiveness
      • Deviousness
      • Tenacity

Re: The golf course looks great, my swing feels good, I like my chances (Part III)
by eyepopslikeamosquito (Canon) on May 28, 2009 at 11:29 UTC

    I could see how the hallvabo-provoked improved Python algorithm could improve my PHP score ... if only I could find an improved magic formula, one with a distinct newline mapping. Unfortunately, my original 70 stroke solution mapped newline to zero, which clashes with I -> 0. I'm happy to report that I did manage to find a very lucky 68 stroker, with a newline -> 10 mapping, as described in detail below.

    Running the following PHP program:

    $m = chr(205).chr(141).chr(207).chr(128).chr(206).chr(142); echo "M " . md5("M".$m) . "\n"; echo "D " . md5("D".$m) . "\n"; echo "C " . md5("C".$m) . "\n"; echo "L " . md5("L".$m) . "\n"; echo "X " . md5("X".$m) . "\n"; echo "V " . md5("V".$m) . "\n"; echo "I " . md5("I".$m) . "\n"; echo "n " . md5("\n".$m) . "\n";
    produces:
    M 999cb80306cbebadbd2d929ba8ed579a D 472969775e593db37081e391c072a85c C 99f4ea53c723dc31fe41bee9bd3f1fcd L 154047519861cb3d7c35ba7d283ce80e X 9fba084f9fdf4d9cd0e480e32b6d26d4 V 004b6563916984013ca50d0ee8add0e7 I daba8b73d682dc51d3c6dfa5ed770c4d n 10a89bcbe90dda3d6bb0e5ee5b8fbe21
    Notice that M -> 999, C -> 99, X -> 9, V -> 4 and I -> 0 are all direct hits. And, crucial for the new algorithm, newline has a distinct value (10) thus ensuring it can be distinguished from the roman letters. Now, applying 472969775 % 2119 produces the desired D -> 499 mapping. So far, so good. Just one more miracle is required: a L -> 49 mapping with the same modulo (2119) as the D mapping.

    Pop quiz: what should 154047519861 % 2119 produce? Perl, Python and Ruby are unanimous on that score: the correct answer is 157. Surprisingly, however, running the following PHP test program:

    $L_num = 154047519861 % 2119; $L_str = "154047519861" % 2119; echo "L_num : $L_num\n"; echo "L_str : $L_str\n";
    produces:
    L_num : -1324 L_str : 49
    The curious thing is that both answers are wrong. The miraculous thing is that the string "answer", $L_str above, produces the desired L -> 49 mapping! So I'm grateful for this (presumably 32-bit related) PHP bug because this lucky hit finally allowed me to break the 70 barrier:
    <?while(11^$n=md5(fgetc(STDIN).XXXXXX)%2119+1)$t+=$n-2*$t%$n?><?=$t;
    where XXXXXX above is chr(205).chr(141).chr(207).chr(128).chr(206).chr(142). 68 strokes!

    Though I admit I never expected to break the 70 barrier, I now believe that a score in the low 60s is a certainty. Here's why.

    Suppose you could find a direct hit: M -> 1000, D -> 500, C -> 100, L -> 50, X -> 10, V -> 5, I -> 1, newline -> 0. You could then replace the:

    while(11^$n=md5(fgetc(STDIN).XXXXXX)%2119+1)
    claptrap with simply:
    while($n=md5(fgetc(STDIN).XXXXXX)*1)
    a saving of eight strokes. The comical "*1" above is to work around a PHP idiosyncrasy, to ensure that "1e3" is interpreted as 1000 and not 1.

    How long would the magic string need to be to find such a solution? The probability of matching three specific characters followed by [a-f] is (1/16)*(1/16)*(1/16)*(6/16) = 1/10923, which is the rough odds of matching M, D and C: "1e3", "500", "5e2", "100", "1e2". Matching 50 and 10 is more likely: (1/16)*(1/16)*(6/16) = 1/689. Matching five and one is more likely still: (1/16) * (6/16) = 1/42. Finally, matching zero is only: 6/16 = 3/8. So the, admittedly very rough, odds of such a miraculously lucky hit is (1/10923)**3 * (1/689)**2 * (1/42)**2 * (3/8), which comes to the order of one in 10**21. Since there are 180 characters available, a ten character magic string can produce 180**10 = 10**22 distinct combinations, making such a lucky hit a virtual certainty. I therefore declare that a 64 stroke solution is certain and shorter ones possible.

    This problem, however, is now no longer one of golf, but of computation. Even with super computers, 10**21 combinations is a daunting challenge and I expect you'd need to set up a cooperative grid computing project to find it. Such a project, of course, is far less worthy than SETI@home or Einstein@Home and would be a horrible waste of CPU cycles. :)

    Update 2012: it seems you don't need a cooperative grid computing project anymore, you can just rent the power in the cloud for around $2 per hour. See also Cracking Passwords in The Cloud: Amazon's new EC2 GPU Instances.

      Though I'd long stopped searching for fresh ideas to try in this game, a little while after this ToastyX-provoked epiphany, I could not help but wonder if the string bitwise & PHP golfing trick might somehow be similarly applied to the older Roman to Decimal game.

      I started with an inspection of my previous best PHP solution:

      <?while(11^$n=md5(fgetc(STDIN).XXXXXX)%2119+1)$t+=$n-2*$t%$n?><?=$t;
      There's a lot to dislike about this code. Much nicer is:
      <?while(+$n=md5(fgetc(STDIN).XXXXXXXXXX))$t+=$n-2*$t%$n?><?=$t;
      While such a solution undoubtedly exists (as argued last time), I don't have a powerful enough computer to find it. The reason such a long (ten character) magic string is likely to be required is because hitting the longer Roman numbers is so improbable; that is, the probability of a random MD5 digest happening to start with 100[a-f], 500[a-f] or 1000[a-f] is too low.

      Can good ol' string bitwise & help here? To find out, I started by running a little Perl program:

      for my $i (48..57,97..102) { printf " %08b (ord %3d %c)\n", $i, $i, $i; }
      to display the bit patterns of all characters that may appear in a MD5 digest:
      00110000 (ord 48, "0") 00110001 (ord 49, "1") 00110010 (ord 50, "2") 00110011 (ord 51, "3") 00110100 (ord 52, "4") 00110101 (ord 53, "5") 00110110 (ord 54, "6") 00110111 (ord 55, "7") 00111000 (ord 56, "8") 00111001 (ord 57, "9") 01100001 (ord 97, "a") 01100010 (ord 98, "b") 01100011 (ord 99, "c") 01100100 (ord 100, "d") 01100101 (ord 101, "e") 01100110 (ord 102, "f")
      Hmmm, though we clearly cannot bitwise & from [a-f] to [0-9] or vice versa, we can bitwise & all digits to zero. For example:
      00111001 (ord 57, "9" character) & 01110000 (ord 112, "p" character) = 00110000 (ord 48, "0" character)
      Does that help? Yes! You see, applying &p to the second, third and fourth characters of the MD5 digest transforms any digit in these character positions to zero, making it much more likely to hit the problematic 100, 500 and 1000 Roman numbers.

      What about the first character? A simple brute force search program showed that the best we can do here is &u, which transforms [0-9] like so:

      0 0 1 1 2 0 3 1 4 4 5 5 6 4 7 5 8 0 9 1
      thus making a starting 1 or 5 more likely.

      Putting these two together, a &uppp bitwise string operation truncates an MD5 digest string to four characters as follows:

      [0145a`de] [0`] [0`] [0`]
      The bitwise &uppp operation therefore reduces the possible numbers in a MD5 digest from millions to just thirteen, namely: 0, 1, 4, 5, 10, 40, 50, 100, 400, 500, 1000, 4000, 5000.

      The next step was to update my old C search program with an adjusted my_md5 function as follows:

      static int my_md5(const unsigned char* inbuf) { int outi = 0; int nib; MD5_CTX mdContext; #ifdef MY_ASM mdContext.A = 0x67452301; mdContext.B = 0xefcdab89; mdContext.C = 0x98badcfe; mdContext.D = 0x10325476; md5_block_asm_data_order(&mdContext, inbuf, 1); #else md5_block_asm_data_order(&mdContext, inbuf); #endif /* 1st byte */ nib = (unsigned char)((mdContext.A >> 4) & 0xF); if (nib > 9) return outi; nib &= 5; // <-- added this line outi = outi * 10 + nib; nib = (unsigned char)(mdContext.A & 0xF); if (nib > 9) return outi; nib = 0; // <-- added this line outi = outi * 10 + nib; /* 2nd byte */ nib = (unsigned char)((mdContext.A >> 12) & 0xF); if (nib > 9) return outi; nib = 0; // <-- added this line outi = outi * 10 + nib; nib = (unsigned char)((mdContext.A >> 8) & 0xF); if (nib > 9) return outi; nib = 0; // <-- added this line outi = outi * 10 + nib; return outi; // <-- only care about first four chars now }
      After adjusting M_TARG and related #defines, running this modified search program unearthed a number of five character magic string hits, but none shorter. From the many different five character solutions, I naturally chose one consisting of all alphabetic characters, namely PQcUv.

      Because PHP and Perl have essentially the same string bitwise operator behavior, I was able to test these ideas in Perl. You can see more clearly how all this works, by running the following Perl test program:

      use Digest::MD5 qw(md5_hex); for my $r (M, D, C, L, X, V, I, "\n") { my $hd = md5_hex( $r . PQcUv ); print "$r $hd ", $hd & uppp, "\n"; }
      which produces:
      M 993332d3c4fa8b3839761ca4dd480f7b 1000 D 503c3b61b971c24100e97c3297882b22 500` C 112da378c970c5a0ff7769acd85095c7 100` L 58dbff2e42141165a1e04bbc629df030 50`` X 90a86e4f0b96ada8fb44bcb43d16f25e 10`0 V 7c9f70b74e8f19325e8dc62d31a9dd87 5`0` I 9bb0bbb1977969abd73e5f8121e6cff7 1``0 c697df2fcf84f10272369ee4482e5c1c a000
      Note that the ` characters above, the result of applying &p to [a-f], terminate the number when it's used in an arithmetic operation. Note too that for PHP we require the newline case to evaluate to zero (as a000 does above) to terminate the controlling while loop (this is not required in Perl because /./g removes the trailing newline).

      Here's an example Perl solution to this game:

      use Digest::MD5 md5_hex; $\+=$n-2*$n%($n=uppp&md5_hex$_.PQcUv)for<>=~/./g;print
      Were md5 a built-in Perl function, this would probably be a winning Perl golf entry. Note, by the way, that it may be possible to replace md5 with Perl's built-in crypt function and be competitive in this game.

      As you might expect, having a built-in md5 function made this new approach a winner in PHP, as demonstrated by the following 63 stroker:

      <?while(+$n=md5(fgetc(STDIN).PQcUv)&uppp)$t+=$n-2*$t%$n?><?=$t;
      Curiously, this turned out to have exactly the same length as the (theoretical) long magic string solution postulated earlier:
      <?while(+$n=md5(fgetc(STDIN).XXXXXXXXXX))$t+=$n-2*$t%$n?><?=$t;

      The shortest known PHP solution to this game is thus reduced from 68 to 63 strokes, and without requiring a super computer.

      A general golfing lesson to take from this little episode is that if you get stuck on one golf game, try another. You may be able to apply tricks learnt in the new game to the one you were previously stuck on.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://762180]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (5)
As of 2014-12-27 11:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls