|Perl: the Markov chain saw|
The golf course looks great, my swing feels good, I like my chances (Part I)by eyepopslikeamosquito (Chancellor)
|on Apr 25, 2009 at 07:08 UTC||Need Help??|
Competing in a modern codegolf tournament can be a lonely experience. You play alone. You analyze alone. After an occasional breakthrough, you celebrate alone, your hard-won solution a closely guarded secret. There is no nineteenth hole, no place to share your solutions with your buddies over a beer.
Accordingly, and given the codegolf Roman-to-Decimal course has been open for more than two years now, I've decided to share my solutions to that game here. You'll get to look over my shoulder as I describe the circumstances and thought processes behind these solutions. I hope you'll find my journey through this competition both interesting and instructive, especially so for the serious, aspiring golfer.
Rules of the Game
In this game, you must convert a Roman numeral to its integer value. The input numeral, provided on stdin followed by a single newline, is guaranteed to be in the range I to MMMCMXCIX (1 to 3999). For example, given IV on stdin, your program should print 4 to stdout.
Test Driven Golf
I always like to start a golf game by writing a comprehensive test program. That saves time in the long run, enabling you to easily verify the many and varied solutions you must concoct to be competitive at golf. So, just like any journeyman test-first developer, I began this game by writing such a test program. A little searching on the CPAN quickly uncovered the Roman module, which I used to write the following test program:
Since this tests all valid input values (1..3999), any solution that passes this test program should be correct. Unfortunately, there are usually holes (no pun intended) in the test programs used by the codegolf web site because, in their fully automated flavour of golf, the test program must complete in less than four seconds, so it's usually not feasible to exhaustively test all possible inputs. Their test program for this game, however, is a pretty good one because it includes test cases for all six of the nastier edge cases (IV, IX, XL, XC, CD, CM), plus eight random ones, making it improbable for an incorrect solution to be accepted.
In any golf game, I try to get a complete working solution as quickly as possible. This method of golfing boosts morale in that you always have a working solution -- and enjoy frequent positive feedback each time you shorten it.
I'd just finished playing the Fonality Golf Challenge, so I began this game simply by inverting Ton's winning HART magic formula from the Fonality game:
86 strokes. Hmmm. The leading score posted at this early stage was a remarkable 60 strokes by perl internals hacker robin. This hot start made it immediately clear that Ton's astonishing and magical decimal-to-roman formula was going nowhere in this new game, where you must convert the other way. That was a relief to me because I didn't relish a repeat of the ill feeling surrounding the Perl Golf Ethics thread.
Since I'd already downloaded the CPAN Roman module, I next took a look at the algorithm it employed, namely:
Though obviously not golfed, I was impressed by this function, especially this line:
because this clever trick enables the algorithm to make one pass through the input string, one character at a time.
My prior golfing experience told me that any algorithm that essentially just traverses a string one character at a time is usually very competitive. And this very common operation of traversing a string one character at a time should be golf-able in just about any language -- assuming the language designers Huffman-coded common operations, as Larry Wall did.
The primary tactical problems to be solved with this algorithm are how to shorten the code surrounding $last_digit and how to perform the roman2arabic lookup.
Regexes always win (Mtv's law of golf)
Golfing this promising algorithm proved straightforward once I remembered Mtv's law and used a regex with side-effects ($') for the lookup table:
Notice that this solution uses the well-known golfing trick, invented by Dutch golfing maestro Eugene van der Pijll, of using $\ as the accumulator, allowing you to print it with a bald print, rather than print$a say, thus saving two strokes.
Getting to 68 with very little effort made this first approach look very promising. Indeed, this simple algorithm formed the basis of the winning solutions in all four languages (Perl, Python, Ruby, PHP), albeit with many twisty tactical turns along the way.
An Interesting Diversion
In golf, it's desirable to have multiple irons in the fire; that is, to have multiple working solutions using different approaches. Why? Because it's often possible to produce a solution shorter than all of them by combining the best ideas from the various approaches. A related lesson I took from this game is the value of writing solutions in all four languages (Perl, Python, Ruby, PHP); the act of doing that often uncovered algorithmic and tactical improvements that could be transferred from one language to another.
So, to broaden possible solutions, I played around with a tr (aka y) transform reminiscent of the Fonality game:
70 strokes. But I had to work very hard. I spent far too long fiddling with this approach because I found it fascinating. My golfing instincts were telling me, however, that this approach was far less promising than the previous simpler one, so I reluctantly gave it up at this early stage and never came back to it.
Since symbolic references were so effective in the Fonality game, I naturally tried a similar approach early in this game. Using xor, a common golfing technique, the required $I, $V, $X, $L, $C, $D, $M variables can be built fairly easily, as illustrated by the following test program, showv.pl:
Running echo hello|perl showv.pl displays:
So far, so good. However, I could find no remotely competitive solution using this approach. For example, though this one does work:
it's 87 strokes in length. Ouch. So I also abandoned this interesting approach early in this game. This shows how fickle golf can be: a winning approach in the Fonality game proved to be a complete dud in this very similar one.
Update: Much later I found an improved 74 stroke symref-based solution:
Functional approaches never win
Much as I enjoy functional programming, this style rarely wins golf tournaments. For fun, I tried a couple of functional solutions, such as:
Though there are surely shorter functional-style solutions (update: see, for example, this one found much later), I quickly became discouraged and gave up on this approach quite early.
Into the Top Ten
This classic Eugene van der Pijll quote is the single best piece of advice I could give to any aspiring golfer. Remembering Eugene's aphorism helped me time and time again during this game, especially later when I started golfing in languages I hardly knew (Python, Ruby and PHP).
Reverting to my original 68 stroke solution:
that $} "previous value" variable was starting to get really annoying. It would be nice to get rid of it somehow. But how? I wonder if it's possible to exploit perl's evaluation order somehow. Well, follow Eugene's advice and just try it!
Yes! No more $}. Down to 64 and on the leaderboard now! Woo hoo! At this point, I allowed myself a few punches in the air with my fist and even jumped up from my desk and ran madly around the room celebrating like a soccer player who just kicked a goal. As for why this solution works, I'll leave that as an exercise for the keen student of perl internals.
The top ten, after four weeks of play:
After the euphoria had died down, I realized I was now completely stuck.
What's the rush?
Then it dawned on me: what's the rush? With this new form of golf, there's no need to risk golf-induced eye damage, as `/anick unfortunately suffered, in a desperate attempt to shave one more stroke before the game closes. You can simply, Zen-like, just keep chipping away at your solution at your own pace. So I just kept whittling away whenever I had some spare time.
Some months later, I stumbled upon an unusual <=> spaceship operator solution:
Notice that, unlike earlier algorithms, this one requires one more iteration than the string length. For this purpose, I utilized the trailing newline, simply changing <>=~/./g to <>=~/./sg initially, at the cost of a single stroke. Later, I got that stroke back by noticing that $/=\1; combined with <> is one stroke shorter.
62 strokes! Only three strokes from the lead now, with "bearstearns" the new leader on 59.
After further months drifted by, I whittled two more strokes:
60 strokes! I felt a bit embarrassed at not finding this simple trick sooner. Not to worry, no rush. Only one from the lead and getting excited now.
For some time now I was aware that changing the table lookup string from:
saved three strokes -- if you could combine it with an expression you had to perform anyway to add one to $' (as was achieved in Re^2: Golf: Magic Formula for Roman Numerals). The trouble is, I could not find an economical way to add one to the shortened lookup string in this game. The best I found is the following 60 stroker:
As indicated at Golf: Magic Formula for Roman Numerals, I was also playing around with replacing the lookup table with a magic formula. At first, I struggled to fit the magic formulae I had found into a short solution to this course, but eventually wound up with a flood of 60 strokers, such as:
I now had many different 60 stroke solutions, but the lead was at 59!
Never give up
Alas, I couldn't improve any of these solutions and I've still got no idea what the leading golfers on 59 were doing.
In utter frustration, I took off for a long two hour walk to clear my head. After about an hour of walking, in desperation, I tried a completely different approach. I remember walking along calculating its golf score in my head, again and again, and kept coming up with 58! Surely, it couldn't be. Usually, when I have these "epiphanies", there's something I've overlooked and the solution fails dismally when I actually test it. With trembling fingers, I typed in the 58 stroke solution:
1, 2, 3, ... 99, 100, ... 3999. Success, all tests pass! Time for another celebration.
For the first time in my life, I'm leading a Perl golf tournament! This solution, requiring no magic formulae at all, is not that hard to find and I remember feeling quite surprised at the time that noone else had found it after more than a year of play.
Magic formulae always win (update to Mtv's law)
When I started writing Python, Ruby and PHP solutions to this problem, I fell back to the magic formula approach, mainly due to my ignorance of general golfing tricks in these languages. Curiously, I was then able to apply the tricks learnt there to improve the Perl magic formula based solutions and so beat the 58 stroke regex-based solution described above.
In the next installment, I'll describe how I golfed this problem in Python and Ruby.
References Added Later
Updated 26-apr 2009: Minor improvements to wording. March 2016: Added Larry Wall quote re Huffman coding.