After my recent "discovery" of the PGA-TRAM algorithm, I couldn't resist coding it up in a variety of languages. I'm doing this as a rosetta code node because I find it fun; hopefully, it'll provide some interesting insights into a variety of programming languages. Please note that this meditation is not about golf, but about implementing an arbitrary, specific algorithm in the most "natural" way in a variety of languages.

Here's the spec:

Write a function, roman_to_dec, to convert a "modern" Roman Numeral to its decimal equivalent. The function takes a single string argument and returns an integer. The string argument may be assumed to always contain a valid Roman Numeral in the range 1-3999. Error handling is not required. For example, roman_to_dec("XLII") should return 42.


Let's get started with a Perl version:

use strict; use warnings; use List::Util qw(reduce); { my %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 ); sub roman_to_dec { reduce { $a+$b-$a%$b*2 } map { $rtoa{$_} } split//, uc(shift) } } my @testdata = ( "XLII", "LXIX", "mi" ); for my $r (@testdata) { print "$r: ", roman_to_dec($r), "\n"; }

Running this program produces:

XLII: 42 LXIX: 69 mi: 1001

To me, this is the most natural way to express the PGA-TRAM algorithm in Perl. It uses a variety of common Perl idioms:

  • A hash (rtoa) for the lookup table.
  • Lexical scope to "data hide" the rtoa hash.
  • A Perlish functional "pipeline": uc -> split -> map -> reduce.
Please feel free to disagree with my assessment and respond with your own preferred Perl solution.

For those folks unfamiliar or uncomfortable with reduce, note that it can be eliminated like this:

sub roman_to_dec { my $t = 0; for my $n ( map { $rtoa{$_} } split//, uc(shift) ) { $t += $n-$t%$n*2; } return $t; }
Though I don't mind that at all, I do prefer the reduce version.


As you might expect, my "most natural" Python 2 solution looks very similar to the Perl one above:

def roman_to_dec(r, __rtoa = dict(M=1000, D=500, C=100, L=50, X=10, V= +5, I=1) ): return reduce( lambda t,n: t+n-t%n*2, (__rtoa[c] for c in r.upper()) + ) testdata = [ "XLII", "LXIX", "mi" ] for r in testdata: print r, ":", roman_to_dec(r)
That this solution is essentially identical to the Perl one shows how similar these two languages are. Indeed, I'm fond of claiming that Perl, Python and Ruby are "essentially equivalent". :) Since Python lacks Perl's simple lexical scoping mechanism, I chose to "data hide" the rtoa hash by making it a default function argument. Note that Python default function arguments are initialized once only. Though I could have employed the Python map function (with a lambda), I chose instead to use a lazily evaluated Python generator list comprehension, (__rtoa[c] for c in r.upper()), because that seemed more "naturally Pythonic" to me.

Update: a Python 3 version (just had to import reduce and use parens with print):

from functools import reduce def roman_to_dec(r, __rtoa = dict(M=1000, D=500, C=100, L=50, X=10, V= +5, I=1) ): return reduce( lambda t,n: t+n-t%n*2, (__rtoa[c] for c in r.upper()) + ) testdata = [ "XLII", "LXIX", "mi" ] for r in testdata: print( r, ":", roman_to_dec(r) )

Plus some much later (2020) discussion of Perl/Python solutions: Re^5: A short whishlist of Perl5 improvements leaping to Perl7


Here's my Haskell solution, using GHC:

{-# OPTIONS_GHC -fglasgow-exts -Wall #-} import Data.Char (toUpper) import Data.List (concat, intersperse) rtoa :: Char -> Int rtoa 'M' = 1000 rtoa 'D' = 500 rtoa 'C' = 100 rtoa 'L' = 50 rtoa 'X' = 10 rtoa 'V' = 5 rtoa 'I' = 1 rtoa r = error $ "Invalid rtoa char:" ++ show r urtoa :: Char -> Int urtoa = rtoa . toUpper roman_to_dec :: String -> Int roman_to_dec = foldl1 (\t n -> t+n-t`mod`n*2) . map urtoa myshow :: (String -> Int) -> String -> String myshow fn val = val ++ ": " ++ (show (fn val)) testdata :: [String] testdata = [ "XLII", "LXIX", "mi" ] main :: IO () main = do putStrLn $ (concat $ intersperse "\n" (map (\c -> (myshow roman_to_d +ec c)) testdata))
Instead of using a rtoa hash, natural in Perl and Python, a rtoa function using simple pattern matching seemed the most Haskelly way of expressing this algorithm.


Finally, my ANSI C++ solution:

#include <cctype> #include <iostream> #include <string> #include <vector> #include <numeric> using namespace std; const int romtab[] = { 0,0,0,0,0,0, 0, 0, 0,0, // 00-09 0,0,0,0,0,0, 0, 0, 0,0, // 10-19 0,0,0,0,0,0, 0, 0, 0,0, // 20-29 0,0,0,0,0,0, 0, 0, 0,0, // 30-39 0,0,0,0,0,0, 0, 0, 0,0, // 40-49 0,0,0,0,0,0, 0, 0, 0,0, // 50-59 0,0,0,0,0,0, 0, 100,500,0, // 60-69 (C:67,D:68) 0,0,0,1,0,0,50,1000, 0,0, // 70-79 (I:73,L:76,M:77) 0,0,0,0,0,0, 5, 0, 10,0 // 80-89 (V:86,X:88) }; inline int rtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t+rtoa(toupper(c))-t%rtoa(toupper(c))*2; } int roman_to_dec(const string& s) { return accumulate(s.begin(), s.end(), 0, accfn); } int main(int argc, char* argv[]) { vector<string> testdata; testdata.push_back("XLII"); testdata.push_back("LXIX"); testdata.push_back("mi"); for (vector<string>::const_iterator iter = testdata.begin(); iter != testdata.end(); ++iter) { cout << *iter << ": " << roman_to_dec(*iter) << '\n'; } return 0; }
The Perl reduce function is called "accumulate" in ANSI C++. Implementing the lookup with the admittedly long-winded romtab[] table felt like natural C++ to me because it seeks high performance. Indeed, if you peek inside <ctype.h> you will likely see toupper() implemented in a similar vein. This makes the rtoa(toupper(c)) above ridiculously fast, costing only two (constant) array lookups, not requiring even a single C function call! Also of note in this solution is that I didn't bother with map, instead simply applying the ultra fast rtoa(toupper(c)) combo twice in the accumulator function.

Update: As indicated below, we can eliminate the call toupper(), while eliminating any memory faults on invalid input, by adjusting romtab as shown below:

// Note: there are less than 256 initializers in this table; // the uninitialized ones are guaranteed by ANSI C to be // initialized to zero. const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t+urtoa(c)-t%urtoa(c)*2; }

I'd love to see some sample implementations in other languages, especially Perl 6. So please feel free to respond away. :)

Other Rosetta Code Nodes

Replies are listed 'Best First'.
Re: Rosetta PGA-TRAM
by moritz (Cardinal) on Jun 13, 2009 at 12:31 UTC
    I wrote a Perl 6 version (transcoded from the Perl 5 version), which uses the following improvements:
    • Instead of another scope for %rtoa, I used a state variable (also available in perl-5.10)
    • I wrote the "pipeline" as a set of method calls, which means they are read in the order that they are executed
    • Use of a formal parameter
    • say instead of print (also available in 5.10)
    • (Updated) use hyphens instead of underscores as word separator in the sub name. It's a matter of taste, but I like it better that way.

    Here's the code, which works with today's Rakudo:

    sub roman-to-dec($x) { state %rtoa = M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1; $x.uc.split('').map({ %rtoa{$_} }).reduce: { $^a+$^b-$a%$b*2 } } my @testdata = <XLII LXIX mi>; for @testdata -> $r { say "$r: ", roman-to-dec($r); }
      Why does the block use $^a in one place (the automatic parameter) and $a in another place? Shouldn't that be a different (undeclared) variable? (Likewise for b).

      If you move the uc you can avoid copying the whole string. Not much savings for a small string I suppose, but you break the clean pipeline nature by copying the input first, rather than doing nothing more than passing each character through the pipeline.

      $x.split('').map({ %rtoa{$_.uc} }).reduce: { $^a+$^b-$a%$b*2 }
      Here's a shaken-up version, starting with noticing how the list return of a for loop is similar to the map function:
         sub infix:<ↀ> ($a,$b) { $a+$b-$a%$b*2 }
         [ↀ] do for $x.split('') { %rtoa{$_.uc} }
      and an excuse to show off the cool form of the reduction metaoperator.

      P.S.: used pre instead of code because Perl Monks still doesn't like Unicode. code escapes out the entities, and the form isn't submitted in UTF-8 so various things are automatically encoded. Nasty.

        Why does the block use $^a in one place (the automatic parameter) and $a in another place? Shouldn't that be a different (undeclared) variable? (Likewise for b).

        No, the ^ twigil is only necessary in the first occurrence. That was introduced because things like this:

        my $block = { my $v = %hash{$^key}; say "The lookup {%hash{$^key}} yields $v"; };

        Would complain about the closure inside the string getting no argument, because $^key was interpreted as a formal parameter to the inner-most closure, which in this case was the one inside the string.

        Or more general, you couldn't refer to outer lexicals that happened to be self-declaring formal parameters.

        So it was decided that after $^foo occurred once, you could refer to it as $foo to, disambiguating it in inner blocks.

      I notice your use of the indirect object form at the very end to avoid parens around (just) brackets. But didn't do so in the other place, because it's not at the end. Moving one paren doesn't make it much better. But you could use feed notation and lose the parens in both places:
      $result= ( $x.split('') ==> map { %rtoa{$_.uc} } ==> reduce { $^a+$^b +-$a%$b*2 } );
      but you still wind up with one paren after a brace. Or if you reversed the written order, so "everything to the right" is indeed the argument to the listop, I think this works:
      reduce {$^a+$^b-$a%$b*2 } map { %rtoa{$_.uc} } $x.split('');
      which makes me think that there is merit in doing it that way as originally presented. listops are naturally written with the processing sequence in the opposite order. Fighting it means adding parens, no matter how you try.
Re: Rosetta PGA-TRAM
by citromatik (Curate) on Jun 13, 2009 at 13:19 UTC

    In OCaml

    let explode str = let rec aux pos acc = if pos < 0 then acc else aux (pred pos) (str.[pos] :: acc) in aux ((String.length str)-1) [] let d2a = function | 'M' -> 1000 | 'D' -> 500 | 'C' -> 100 | 'L' -> 50 | 'X' -> 10 | 'V' -> 5 | 'I' -> 1 | _ -> failwith "Incorrect character" let r2d str = List.fold_left (fun acc x -> let x' = d2a x in acc + x' - acc mod +x' * 2) 0 (explode (String.uppercase str));;
    • explode just converts a string into a list of its characters. The future standard library for OCaml, named Batteries included will include such a function by default.
    • The functional nature of OCaml allows a natural implementation of the algorithm


Re: Rosetta PGA-TRAM
by Arunbear (Prior) on Jun 13, 2009 at 18:35 UTC
    A Scala version (basically similar in form to the Perl/Python versions):
    val rtoa = Map('M' -> 1000, 'D' -> 500, 'C' -> 100, 'L' -> 50, 'X' -> +10, 'V' -> 5, 'I' -> 1) def romanToDec(roman: String) : Int = => rtoa(c.toUpperCase)).reduceLeft((x, y) => x + y - x + % y * 2) val testData = List("XLII", "LXIX", "mi") for(r <- testData) println(r + ": " + romanToDec(r))
Re: Rosetta PGA-TRAM
by eyepopslikeamosquito (Bishop) on Jun 14, 2009 at 01:11 UTC

    Shortly after posting, I noticed an obvious improvement to my C++ solution. If I'm going to use a table, I might as well eliminate the toupper() call by mapping the lower case letters also. Also, increasing the table size to 256 eliminates any possible nasty memory faults if passed an invalid roman letter, while further guaranteeing that zero will be returned in that case.

    // Note: there are less than 256 initializers in this table; // the uninitialized ones are guaranteed by ANSI C to be // initialized to zero. const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t+urtoa(c)-t%urtoa(c)*2; }

Re: Rosetta PGA-TRAM
by afoken (Canon) on Jun 13, 2009 at 21:20 UTC


    if (!Array.prototype.push) { /* IE < 5.5 lacks Array.push() */ Array.prototype.push=function(arg) { /* this is only the minimum required for map, push should +accept more than one parameter */ this[this.length]=arg; return this.length; } } /*** Bootstrap Perl ;-) ***/ Array.prototype.foreach=function(callback) { /* * callback is called with a single argument, the current a +rray element * callback return value is ignored * returns nothing */ for (var i=0; i<this.length; i++) { callback(this[i]); } }; { /* * callback is called with a single argument, the current a +rray element * callback returns an array of zero or more values * returns a new array of all callback return values */ var rv=new Array(); this.foreach(function(x) { callback(x).foreach(function(y) { rv.push(y); }); }); return rv; }; /*** List::Utils ***/ Array.prototype.reduce=function(callback) { /* * callback is called with two arguments, its last return v +alue (initially the first array element) and the next array e +lement * callback returns a single value used for the next iterat +ion * callback is not called when array contains less than two + elements */ if (this.length==0) return null; var rv=this[0]; for (var i=1; i<this.length; i++) { rv=callback(rv,this[i]); } return rv; }; /*** The real code ***/ roman_to_dec=(function(){ var rtoa={ M:1000, D:500, C:100, L:50, X:10, V:5, I:1 }; return function(str) { return str.toUpperCase().split('').map(function(_){ return [ rtoa[_] ]; }).reduce(function(a,b) { return a+b-a%b*2; }); }; })(); function main() { var testdata=["XLII", "LXIX", "mi"]; alert({ return [ _ + ": " + roman_to_dec(_) + "\n" ]; }).join("") ); };

    Some notes:

    • There are no lists in Javascript, you have to use Arrays.
    • The { ... }; notation adds new methods to ALL arrays, even after they were created. This is compareable to package Array; sub xxx { ... } -- except that Perl arrays aren't objects.
    • this is the current object. In the new Array methods, it is the array itself.
    • Javascript has anonymous functions, and you can / have to use them surprisingly often, because Javascript doesn't have code blocks.
    • You don't have anything like local in Javascript, so the implicit $_ used in Perl's map and foreach and $a and $b in reduce have to be passed explicitly to the callback.
    • The variable %rtoa is only available to roman_to_dec in the original perl code, and the same applies to my Javascript code. But because there are no code blocks, you need an anonymous function to create a private scope for rtoa. And that function needs to return the function that implements roman_to_dec, because otherwise there would be no way to access the roman_to_dec function. Finally, the anonymous function is called once with no arguments(), returning the implementation.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Hmmm, foreach, map, and reduce seem to be ideas that are too good to be left out from Javascript, as I found out lately. has all the details. Unfortunately, those features require JS 1.6 or even 1.8, despite naming 1.5 in the URL. The 1.5 in the URL is not completely wrong, as the site gives copy-and-paste-able replacement implementations in Javascript 1.5, so you can also use the methods in older browsers.

      foreach was camelCased to forEach, Perl's grep is named filter. There is also a reduceRight method that works from right to left.

      The calling conventions pass the current element, the index into the array, and the array object itself to the callback. While one could construct examples where the extra parameters could be useful, I think this "tastes" bad: No callback should need to access (and modify) the original array while iterating over the array. No callback should need to know the current index into the array. If it does, you are probably reinventing push, pop, shift, unshift, or splice.

      The new methods allow specifying a different object for this inside the callback, which can be useful sometimes. If you don't specify an object, the array object is used.

      Someone also recognised that map could be useful outside array contexts and constructed this piece of code. It just hurts my eyes.

      var a ="Hello World", function(x) { return x +.charCodeAt(0); }) // a now equals [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100]


      This code works "by accident", because can iterate over everything that behaves like an array, not only over real Arrays. The call method available on all Function objects allows to call a method of one object with this set to another object (the first argument passed to call). In this case, this is set to an instance of a String object, over which the method iterates.


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Rosetta PGA-TRAM
by merlponk (Scribe) on Jun 15, 2009 at 10:13 UTC
    Well, I'm not only new to Perlmonks, but even newer to Lisp, as I just started to learn it. So, since a lot of different solutions have been posted, I thought I post my Common Lisp version (there's probably a better way to do it and I hope there isn't a bug):
    (defun roman_to_dec (roman) (if (not (equal roman "")) (let ((rtoa '((#\M . 1000) (#\D . 500) (#\C . 100) (#\L . 50) (#\X . 10) (#\V . 5) (#\I . 1)))) (reduce #'(lambda (a b) (- (+ a b) (* (mod a b) 2))) (map 'list #'(lambda (c) (cdr (assoc c rtoa))) (string-upcase roman))))))
    Update: Ok, the
    (if (not (equal roman ""))
    could probably be removed, because no error checking is necessary regarding to the description, and it is halfhearted to boot.
Re: Rosetta PGA-TRAM
by BrowserUk (Pope) on Jun 16, 2009 at 17:56 UTC

    I rarely define such subroutines without also defining the inverse. And once you have the inverse it seems like the most natural thing to do is use it to set up a straight forward table-driven version of the required routine:

    #! perl -sw use 5.010; use strict; use constant _1to9 => [ '', qw[ I II III IV V VI VII VIII IX ] ] +; use constant _10to90 => [ '', qw[ X XX XXX XL L LX LXX LXXX XC ] ] +; use constant _100to900 => [ '', qw[ C CC CCC CD D DC DCC DCCC CM ] ] +; sub dec2roman { my $dec = shift; die "Bad input' unless $dec =~ m[^[0-9]+$] and $dec < 4e3; my @decDigits = split '', $dec; my $roman = ''; $roman .= 'M' x shift @decDigits if @decDigits == 4; $roman .= _100to900->[ shift @decDigits ] if @decDigits == 3; $roman .= _10to90-> [ shift @decDigits ] if @decDigits == 2; $roman .= _1to9-> [ shift @decDigits ] if @decDigits == 1; return $roman; } sub roman2dec { state %roman2dec; %roman2dec = map{ dec2roman( $_ ), $_ } 1 .. 3999 unless %roman2de +c; my $roman = uc shift; return $roman2dec{ $roman } if exists $roman2dec{ $roman }; die "Input '$roman' out of range (1..3999 (roman) inclusive)"; } for my $r ( qw[ XLII LXIX mi MMMCMLXXXVIII ] ) { printf "%15s : %d\n", $r, roman2dec($r); }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Rosetta PGA-TRAM
by whakka (Hermit) on Jun 16, 2009 at 17:06 UTC
    Pretty straightforward to translate to a Java version:
    import java.util.HashMap; public class pgaTram { private static HashMap<Character, Integer> rtoa = loadRTOA(); public static void main(String[] args) { String[] testdata = { "XLII", "LXIX", "mi" }; for (String t : testdata) { System.out.println(t + ": " + romanToDec(t)); } } public static int romanToDec(String s) { int t = 0; for (char c : s.toCharArray()) { int n = rtoa.get(Character.toUpperCase(c)); t += n - t % n * 2; } return t; } private static HashMap<Character, Integer> loadRTOA() { HashMap<Character, Integer> rtoa = new HashMap<Character, Inte +ger>(); rtoa.put('M', 1000); rtoa.put('D', 500); rtoa.put('C', 100); rtoa.put('L', 50); rtoa.put('X', 10); rtoa.put('V', 5); rtoa.put('I', 1); return rtoa; } }
Re: Rosetta PGA-TRAM
by John M. Dlugosz (Monsignor) on Jun 17, 2009 at 22:02 UTC
    Many years ago, I updated/rewrote the "standard" Roman Numeral module to handle Unicode characters. This was an early attempt to exercise the Unicode stuff in Perl 5. As I recall, there were a few interesting things including Unicode code points specifically for roman numerals.Ⅸ for example is a single character. Also, more loops around the "CD" will further multiply: ↈ is 100,000.
Re: Rosetta PGA-TRAM
by Arunbear (Prior) on Feb 11, 2011 at 17:46 UTC
    This is an implementation with lots of parentheses :)
    #lang scheme (define (sum-of-ints li [sum 0]) (if [< (length li) 2] (+ sum (apply + li)) (let ([f (first li)] [s (second li)]) (if [> s f] (sum-of-ints (cddr li) (+ sum (- s f))) (sum-of-ints (cdr li) (+ sum f)))))) (define roman-to-dec '()) (set! roman-to-dec (let ([rtoa #hash((#\M . 1000) (#\D . 500) (#\C . 100) (#\L . 50) (#\X . 10) (#\V . 5) (#\I . 1))]) (lambda (roman) (sum-of-ints (map (lambda (x) (hash-ref rtoa x)) (string->list (string-upcase roman))))))) (define test-data '("XLII" "LXIX" "mi" "MCMLXXXV")) (for ([r test-data]) (printf "~a ~a~n" r (roman-to-dec r)))
    +% mzscheme XLII 42 LXIX 69 mi 1001 MCMLXXXV 1985
    Like the perl version, it uses a lexical scope to hide the hash.

    It however uses a method of combining the numbers that is closer to how we'd do it in our heads i.e. following the Subtractive principle

Re: Rosetta PGA-TRAM
by afoken (Canon) on Aug 04, 2011 at 17:18 UTC

    M alias MUMPS:

    All jokes you could invent about naming a programming language after a viral disease can't even come close to how sick MUMPS is.

    On the other hand, I see several features in MUMPS that were re-invented decades later. MUMPS has trees that can be used as multi-level associative arrays, and unlike Perl's hases, the associative arrays are even sorted. MUMPS has regular expressions, string eval, exception handling, locking, local, post-conditions, automatic converting from string to number and back, and some other features that you would not expect from a language designed to run on a PDP-7. The most important feature are "globals", structured variables available in all programs, stored on disk. MUMPS fans call that feature a database. Modern languages would perhaps call them persistent super-globals. Don't confuse them with ordinary global variables that are kept separately for each session in memory.

    I see MUMPS as wild mix of Perl, DBM files, home computer BASIC, a macro assembler, and a big heap of punch cards. (Actually, MUMPS does not use punch cards, but "modern" VT420 terminals or VT420 emulations.)

    The code below runs on MSM (Micronetics Standard M) Version 4.4.1. I think it could run on other MUMPS implementations with minor changes.

    Note that this example uses quite modern constructs (for MUMPS), like user defined functions (returning a value) instead of procedures (returning no value, but possibly modifying global variables), arguments for functions (instead of using global variables), private variables (instead, as you may have guessed, using global variables). I just can't get used to stuffing everything into global variables. And my co-workers can't get used to using local variables. "We use global variables since three decades, and we never had problems with them." (Except for those "rare" cases once a month when global variables were accidentally overwritten.)

    The code also is quite verbose for a MUMPS program. Most code I see at work has each and every line stuffed to the maximum allowed (because those ancient PDP machines executed code faster when it was stuffed into a single line), variable names tend to be as short as possible (why use all eight significant characters when you can use just two or three?), and comments are used as a poor replacement for SVN. I think my co-workers could write functionally equivalent code with half of the lines.

    ROSETTA ;Rosetta PGA-TRAM Example; [ 08/04/2011 4:39 PM ] S TESTDATA="XLII,LXIX,mi" F I=1:1 S R=$P(TESTDATA,",",I) Q:R="" D .W R,": ",$$ROM2DEC(R),! Q REDUCE(CALLBACK,LIST) ; N (CALLBACK,LIST) I $O(LIST(""))="" Q 0 I $O(LIST(1))="" Q LIST(1) S A=LIST(1) F I=2:1:$O(LIST(""),-1) D .S B=LIST(I) .S @("A=$$"_CALLBACK_"(A,B)") Q A HELPER(A,B) ; N (A,B) Q A+B-(A#B*2) ROM2DEC(X) ; N (X) S RTOA("M")=1000,RTOA("D")=500,RTOA("C")=100,RTOA("L")=50 S RTOA("X")=10,RTOA("V")=5,RTOA("I")=1 S X=$TR(X,"abcdefghijklmnopqrstuvwxyz","ABCDEFGHIJKLMNOPQRSTUV +WXYZ") F I=1:1:$L(X) S LIST(I)=RTOA($E(X,I)) Q $$REDUCE("HELPER",.LIST)

    Explaining every aspect would take days, so I have to omit many details.

    • MUMPS is generally interpreted, MSM uses some tricks (pre-compiling) to speed up the interpreter, but that is completely transparent to the programmer.
    • Commands and build-in functions can be abbreviated to just one or two letters, and they typically are. MUMPS code written only with unabbreviated names exists only in educational books.
    • White space is relevant, because the parser had to be simple. Each and every command is followed by a single space separating command token and arguments token. The arguments are followed by one or more space character. So, a command without arguments is followed by two spaces, not just one.
    • A line starts with an optional numeric or alpanumeric label, followed by whitespace (typically a TAB), followed by zero or more commands, optionally followed by a comment starting with a semicolon.
    • Blocks are not available. The scope of FOR, IF, and ELSE is limited to a single line. Recent versions of MUMPS allow pseudo-blocks, indented with dots, but they are actually anonymous subroutines called by an argument-less DO command. The important difference is the behaviour of the QUIT command. QUIT in the FOR line aborts the for loop, QUIT in the pseudo-block just leaves the anonymous subroutine and thus starts the next iteration.
    • All expressions are evaluated strictly from left to right, so WRITE 1+2*3 writes 9, not 7. If you want a different evaluation order, you have to use brackets: WRITE 1+(2*3) writes 7, as expected.
    • While MUMPS has associative arrays, it lacks regular arrays. You can either use a character-delimited string (limited to 255 or 511 characters) and the $PIECE function (think of it as split when used on the RHS and as a combination of split and join when used on the LHS), or you can use an associative array with numeric keys and the $ORDER function (keys, each).

    The program ROSETTA starts with the main routine, the first line contains the program name as a label and a comment, no executable code (by convention). Note that the timestamp is automatically updated by the editor (poor man's SVN). The second line assigns the three test cases to the string TESTDATA. The third line is a for loop, starting with I=1, incrementing I by 1 in every loop, with no upper limit. Inside the loop, R is set to the I-th piece of TESTDATA using the $PIECE function (pieces are separated with commas). Still inside the loop, the loop is aborted using the QUIT command if R is empty (because no more pieces are available), else the anonymous subroutine starting (and ending) in the next line is executed for each iteration. The fourth line writes the value of R, a colon and a space, the result of the ROM2DEC function invoked with the argument R, and a newline. The fifth line aborts the program.

    The REDUCE label defines a function or procedure with arguments. Arguments are handled similar to Javascript in that each argument becomes a variable. All arguments are passed by value, unless the caller explicitly passes a variable by reference (prefixing its name with a dot, see last line). To access trees, you have to pass them by reference, else you see only the value of the tree's root element. I prefer not to have MUMPS code after a function name, hence the empty comment. The next line calls the NEW command to make all variables (except for the on-disk globals, see top of this posting) except those in brackets invisible. Returning from the function will destroy this new set of variables, and the old variables will be visible again. As I don't need any other variables, I just keep the arguments. Line three checks for the first key of LIST using the $ORDER function, if it returns an empty string (end-of-list), the list must be empty and REDUCE returns 0. (Note that MUMPS has a kind of undef, but it is most times a special case and causes errors. You can not simply return an undefined value in MUMPS.) The next IF command checks for the key following the key 1 in the LIST, if end-of-list occured, REDUCE returns the first list element. (MUMPS has no conventions for the index of the first array element, but 1 is common, so I choose a 1-based array.) The next lines are quite boring, except for two constructs: $ORDER(LIST(""),-1) is a "new" feature, it returns the last key of LIST ($ORDER(LIST("")) returns the first key). And @() is the indirection operator, the little brother of string eval. The line S @("A=$$"_CALLBACK_"(A,B)") is expanded at runtime to SET A=$$callback_value(A,B), inside the ROSETTA program, CALLBACK always contains "HELPER", so the line is expanded to SET A=$$HELPER(A,B). Note that due to the minimalistic parser and strict left-to-right evaluation, the entire argument must be given using the indirection operator, S A=@("$$"_CALLBACK_"(A,B)") does not work.

    The HELPER label defined a second function, it is the equivalent of the code block passed to List::Util::reduce(). Again, NEW is used to generate a new, temporary set of variables (so that the variables inside REDUCE are not overwritten), and as in REDUCE, the QUIT is used to return the value of an expression. Note that brackets are required due to the left-to-right evaluation order. # is the modulo operator.

    The last function is ROM2DEC, with the usual prolog to generate a new, clean set of variables. RTOA is used as an associative array. Note that you have to assign each element separately, there is no shortcut notation. The longest line is the MUMPS equivalent to $x=~tr/[a-z]/[A-Z];, poor man's lc() uc(). The for loop in the next line starts with I=1, increments by 1, and terminates automatically after reaching $LENGTH(X). Inside the loop, a list representing the value of each roman digit in X is built. $EXTRACT(X,I) is equivalent to substr($X,$I,1). The for loop replaces the Perl fragment map { $rtoa{$_} } split//. ROM2DEC ends with a tail call to REDUCE, calculating the value of X from the digit values in LIST.

    And this is how you run ROSETTA from the programmer prompt:

    >D ^ROSETTA XLII: 42 LXIX: 69 mi: 1001 >

    You can also call the various functions inside the program:


    Yes, MUMPS programs may have more than one enty point. There is no difference between a program and a library. You get used to that, like you get used to many other ugly tricks. Most notably that variables survive a program exit and are available for the next program.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Rosetta PGA-TRAM
by Arunbear (Prior) on May 21, 2012 at 14:37 UTC
    A nearly verbatim translation to Ruby:
    $rtoa = { :M=>1000, :D=>500, :C=>100, :L=>50, :X=>10, :V=>5, :I=>1 } def roman_to_dec(roman) roman.split(//).map { |c| $rtoa[c.upcase.to_sym] }.reduce { |t, n | +t+n-t%n*2 } end %w[ XLII LXIX mi ].each { |r| puts roman_to_dec r }