|Problems? Is your data what you think it is?|
Were it not for this chance argument during a lunchtime walk with my workmate and good friend, Yorkey, I probably would have given up the Roman to Decimal golf game after a few weeks. After all, I was completely stuck at that time with my Perl solution and had no fresh ideas to try out. However, Yorkey's stubborn refusal to change his point of view provoked me into proving my point to him by playing this golf in languages I hardly knew, namely Ruby, Python and PHP.
I decided to start with Ruby because I at least had a passing familiarity with that language after audreyt stayed for a few days at my home while my wife was out of town. During her stay, we went through the library section of the Pickaxe book together because Audrey felt it would make a great model for documenting the Perl 6 libraries. Unfortunately, the absent-minded Audrey left one of her earrings behind on our bedside table and I assumed it belonged to my wife, so just left it there. On her return, my wife saw the earring sitting on her bedside table and freaked. I had previously told my wife that Perl hacker "autrijus" was coming to stay for a few days while she was away. Luckily, when I hastily explained that autrijus had become audrey, my wife judged it unlikely that I would invent such a story and quickly calmed down. :-)
After a full month of play, the Perl leader was robin on 60 strokes, with Ruby languishing far behind on 73. So I naturally thought it "impossible" for Ruby to overtake Perl in this game -- and ludicrous to suggest that I might be able to beat my Perl solution in Ruby. My expectations were much lower than that; I simply wanted to be competitive in Ruby (anywhere in the 70s would be fine) so as to shut Yorkey up.
Taking the Lead in Ruby
Since I'd already worked out some basic magic formulae by that time, I naturally started converting these to work with Ruby. Unfortunately, in addition to mapping M -> 1000, D -> 500, C -> 100, L -> 50, X -> 10, V -> 5, I -> 1, I needed to further map the trailing newline to zero because I could find no short way of removing it in Ruby. In Perl, the trailing newline was easily removed via /./g. This extra newline mapping invalidated most of the Perl magic formulae I had previously found, so I had to adjust my magic formula searcher and start searching all over again.
I won't bore you here with the gory details of my magic formula searcher, written in C, for speed. To get a feel for what these searching programs look like, take a look at my simple one, described in Golf: Magic Formula for Roman Numerals, or Ton's much more complex one.
My new and improved search program found a Ruby-friendly magic formula easily enough, and I was flabbergasted when my first Ruby approach, despite using the nine character each_byte method, was equal leader on 73 strokes!
As you can see, this is just a straightforward translation of the original algorithm I was using in Perl, albeit with a magic formula replacing the Perl regex-based lookup table. As I was quick to point out to Yorkey, I didn't need to be a Ruby expert to do this, just needed to know the core parts of the language and, more importantly, find a good algorithm.
I started with each_byte only because I couldn't get the shorter getc function to work. For example, this attempt:
failed to compile with "undefined local variable or method `c'". Following Eugene's advice of "Can't possibly work, try it anyway", I changed c to C (uppercase variables are constants in Ruby):
and it worked! The screen was littered with "warning: already initialized constant C" messages (written to stderr), but these don't matter to codegolf, which only cares about what is written to stdout. Combining with a well-known Ruby golfing trick of replacing the three-char 238 with the two-char ?ascii-char-with-ord-of-238, shortened my solution to 65. As you might expect, I felt elated at leading the Ruby experts by eight strokes! And, more importantly, forcing Yorkey to eat his words.
Choking on my breakfast cereal
Complacency is dangerous in golf and I had become complacent. If anyone had told me at this time that I could reduce my Ruby solution from 65 strokes all the way down to 53, I would have declared them insane.
After basking for months in my newly acquired Ruby fame, I almost choked on my breakfast cereal when I checked the codegolf leaderboard one morning and noticed that Python golfing god, Mark Byers, had posted a 59 stroke Ruby solution. This was intolerable! Back to work.
After experimenting some more with Ruby's evaluation order, I came up with a weird spaceship operator 60 stroke solution:
I've left the 238 above for readability, but my submitted solution naturally used the ?ascii-char-with-ord-of-238 dirty trick mentioned earlier. This solution introduces another dirty Ruby golfing trick, namely using a .* "method call" for "multiplies" rather than *(...), thus saving a stroke by eliminating the parens. You can try this trick routinely when golfing in Ruby whenever you need to change operator precedence -- though it doesn't always work, Ruby's parsing being pretty quirky, in my experience. By the way, it was this Ruby solution that inspired my weird 62 stroke Perl spaceship operator solution, mentioned in the previous article, an example of transferring ideas from one language to another. Often the hard part in golf is generating new ideas to try, and using multiple languages is a fertile source of fresh ideas.
Alas, I couldn't improve this solution further, so switched to Python, hoping to take revenge on the "Python golfing god" there.
Python Baby Steps
As you might expect by now, my first Python attempt was the same ol' same ol':
89 strokes! This solution bears a close resemblance to the earlier Ruby ones. Notice that Python, like Perl, but unlike Ruby, does not need to map the trailing newline because the Python raw_input function removes it.
Two further strokes were whittled easily enough with:
Notice too that in Python, alone among the four languages, assignment is not an operator. This proved a chronic nuisance in this game because I couldn't see any opportunity to exploit evaluation order to eliminate the "previous value" variable (p in the Python solution above).
Another generally applicable golfing tip is to study every single built-in function the language has to offer, especially the short ones. When I did that, the Python hash function caught my eye. I wonder if it could be used in a magic formula? Well, it seems to have better properties for this purpose than ord and is only one stroke longer. Definitely worth a try. It did indeed improve things:
... but only by one stroke. 86 strokes now, but still a gaping eight strokes behind the Python golfing god.
Going for the Outright Lead
Necessity is the mother of invention
The Python solutions are different to the Ruby and Perl ones in that you have to either map the hash/ord functions, or assign them to a variable, as in x=84169%ord(c), because all the magic formulae seen so far use the character twice. It occurred to me therefore, that if I could find a magic formula that used each character in the input string once only that would be a big saving in Python. How to find such a formula? I have no idea, but I played around one afternoon, just trying stuff, and stumbled on a gem:
By way of explanation, notice that the magic formula 205558%ord(r)%7 maps M -> 3, D -> 6, C -> 2, L -> 5, X -> 1, V -> 4, I -> 0 as shown in the following table:
Generally, formulae that map M -> 3, C-> 2, X -> 1 and I -> 0 are highly effective because applying "%NNNN", where NNNN > 1000, does not mangle the already matching 10**m, so instead of requiring seven lucky hits, you now need only three (D, L and V).
Combining this new formula with the same modulo trick I used to move my Perl solution from 62 to 60 strokes reduced my Python solution to 78 strokes and tied for the lead with Mark Byers!
Code Golf is 10% strategy, 90% tactics
Actually, I've found many different 78 stroke Python solutions, but none shorter. Here are some more variations in the middle line:
The last one is noteworthy in that it uses a different mapping, namely M -> 2000, D -> 1000, C -> 200, L -> 100, X -> 20, V -> 10, I -> 2. Also noteworthy is that, because it divides by two (n/2), it also works with a:
initialization. This observation will allow us later to exploit a Ruby built-in variable ($.), which is initialized to one. Note that this second alternative mapping is available, without penalty, in Ruby and Python, but not Perl and PHP, for various complicated tactical reasons. These are the sort of tactical tricks that are crucial when fighting for the lead in golf.
Incredibly, applying what I learnt in my Python diversion to Ruby, plus yet another dirty Ruby trick (using the Perl-inspired Ruby built-in variable $. to eliminate the t=0), enabled me to reduce my Ruby solution from 60 strokes all the way down to 53 and so steal the outright lead from "primo":
Success is never final -- Winston Churchill
Of course, I can't prove that I've found the optimal magic formula. It's also likely that further language or algorithmic golfing improvements will be found, especially given my relative inexperience in Ruby and Python.
In the next installment of this series, I'll show off my PHP solutions.
Leaderboards, end of April 2009
All languages (281 entries):
Perl (69 entries):
Ruby (86 entries):
Python (87 entries):
PHP (62 entries):
One month later, the leaderboard changes are: