saintmike has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks,

does anybody know of a module solving the classic compiler class task of translating an arithmetic expression into RPN (reverse polish notation)? Basically, I have something like

"2*(somevar+other) + max(this, that)"
and want it to be transformed into
("2", "somevar", "other", "ADD", "MULT", "this", "that", "MAX", "ADD")
There's a couple of related modules on CPAN, like Parse::RPN or Math::RPN, but neither one does the trick.

Any pointers to code snippets would be great, before I roll my own.

Replies are listed 'Best First'.
Re: Generic RPN Translator available?
by stvn (Monsignor) on Jan 19, 2005 at 14:48 UTC
    Any pointers to code snippets would be great, before I roll my own.

    The simplest way I know to convert Infix to RPN is to use stacks. You will need two stacks, one for operators, and the other a container for your RPN formated expression. Then the algorithm basically goes something like this:

    my $token_count = 0; while (@tokens) { my $token = shift @tokens; # if its an opening paran then ... if ($token eq '(') { push(@operator_stack, $token); } # if it's an operator then ... elsif (is_operator($token)) { push(@operator_stack, $token); } # if its a closing paran then ... elsif ($token eq ')') { # empty the operator stack until # we find the opening paren while (@operator_stack && $operator_stack[-1] ne '(') { # and push everything onto the # operator stack push @program_stack, (pop @operator_stack); } # then pop the opening paren off pop @operator_stack; } # if its not an operator, or a paran ... else { push @program_stack, $token; $token_count++; # if we have 2 or more tokens on the # program stack (since out last operator) # we need to get an operator .... if ($token_count >= 2){ # unless the top of our operator stack # is a opening paran, in which case # we have a sub-expression which needs # to be handled first next if $operator_stack[-1] eq '('; # otherwise grab the operator, and ... push @program_stack, (pop @operator_stack); # reset the token count $token_count = 1; } } } # and lastly, empty whats left of your # operator stack onto your program stack push @program_stack, (pop @operator_stack) while (@operator_stack);

    I got this algorithm from a old book on FORTH, and they had this table in it, which I found helpful in visually explaining the algorithm.

     source string  | operator stack | program stack
        A+(B*C-D)/E |                |                
         +(B*C-D)/E |                | A
          (B*C-D)/E | +              | A                                      
           B*C-D)/E | +(             | A      
            *C-D)/E | +(             | AB
             C-D)/E | +(*            | AB                    
              -D)/E | +(*            | ABC           
               D)/E | +(-            | ABC*
                )/E | +(-            | ABC*D                     
                 /E | +              | ABC*D-            
                  E | +/             | ABC*D-                         
                    | +/             | ABC*D-E                                       
                    |                | ABC*D-E/+      

    Of course, this code and algorithm somewhat relies on properly nested parens, and will not handle the 'max()' part of your expression, but it's something to start with.

    The more complex way to do it (and both more flexible and robust) is to build a parse tree out of your expression, then you can get infix, postfix, prefix, whatever-fix order you want just by traversing the tree in different ways. I actually show examples of this conversion-through-traversal in the docs for my module Tree::Binary (more specifically the examples are in Tree::Binary::Visitor::InOrderTraversal, Tree::Binary::Visitor::PreOrderTraversal and Tree::Binary::Visitor::PostOrderTraversal in their SYNOPSIS sections).

    The algorithm for building a parse tree is not entirely unlike the stack version above, except its more complex and you are not building a program stack, but a parse tree. The operator stack is still a very useful tool in helping you decide to go up or down the tree. Unfortunately I don't have any example code for this, and besides, it is very dependent upon how you structure your tree anyway. I am sure though that a quick google search on building parse trees will provide you with some insight.

Re: Generic RPN Translator available?
by Jasper (Chaplain) on Jan 19, 2005 at 10:02 UTC You'll find plenty of code samples here ;)
Re: Generic RPN Translator available?
by Anonymous Monk on Jan 19, 2005 at 09:39 UTC
    Well, I'd use Parse::RecDescent, but that still requires you to write a grammar, and some rewrite rules.
Re: Generic RPN Translator available?
by BrowserUk (Patriarch) on Jan 20, 2005 at 09:34 UTC

    This might form a useful starting point. It seems reasonably accurate, flexible and efficient.

    Update: Improved error handling a little and added -XLATE option.

    Update2: Apparently there is something wrong with this code. Would someone care to tell me what that is?

    #! perl -slw use strict; use List::Util qw[ reduce ]; $a=$b; our $XLATE ||= 0; sub nestedOk{ index( $_[ 0 ], '(' ) <= index( $_[ 0 ], ')' ) and 0 == reduce{ $a + ( $b eq '(' ) - ( $b eq ')' ) } 0, split'[^()]*', $_[ 0 ] } my $re_var = qr[ [a-zA-Z]\w* ]x; my $re_subex = qr[ \{\d+\} ]x; my $re_func = qr[ $re_var $re_subex ]x; my $re_num = qr[ -? \d+ (?: \. \d+ )? (?: [Ee] [+-]? \d+ )? ]x; my $re_term = qr[ $re_func | $re_subex | $re_var | $re_num ]x; my $re_op = qr[[,%+*/^-]]; my %ops = ( qw[ % MOD + ADD * MULT / DIV ^ POW - SUBT ] ); sub exp2rpn { my( $exp, $aStack, $aBits ) = @_; die "Unbalanced parens: '$exp'" unless nestedOk $exp; { my( $left, $op, $right, $rest ) = $exp =~ m[ ^ ( $re_term )? ( $re_op )? ( $re_term ) ( .* ) $ ]x or die "malformed (sub)expression '$exp'"; for ( $left, $right ) { next unless $_; if( my( $func, $subex ) = m[^ ( $re_var )? \{ ( \d+ ) \} $ +]x ) { exp2rpn( $aBits->[ $subex ], $aStack, $aBits ); push @$aStack, $func if $func } else{ push( @$aStack, $_ ); } } push @$aStack, $XLATE ? $ops{ $op } : $op if $op and $op ne ','; $exp = $rest, redo if $rest; } return $aStack; } sub parseExp { local $_ = $_[ 0 ]; s[\s+][]g; my( $bit, @bits ) = 0; s[\( ( [^()]+ ) \)]{ push @bits, $1; "{${ \( $bit++ ) }}"; }ex while m[[()]]; my $toplvl = $_; return @{ exp2rpn $toplvl, [], \@bits }; } die "No expression given\n" unless @ARGV; print join', ', parseExp $ARGV[ 0 ]; __END__ P:\test>423305-2 "2*(somevar+other) + max(this, that)" 2, somevar, other, +, *, this, that, max, + P:\test>423305-2 "5^((-2+x)*sin(p+4)/fred)" 5, -2, x, +, p, 4, +, sin, *, fred, /, ^ P:\test>423305-2 "A+(B*C-D)/E" A, B, C, *, D, -, +, E, / P:\test>423305-2 "max( a, b, c, d ) * atan( pi*4, -1 )" a, b, c, d, max, pi, 4, *, -1, atan, * P:\test>423305 -XLATE "5^((-2+x)*sin(p+4)/fred)" 5, -2, x, ADD, p, 4, ADD, sin, MULT, fred, DIV, POW

    You may need to tweak the named regex at the top to match your definition of variable names, function names and acceptable numeric forms.

    Functions are assumed to have fixed numbers of arguments and know how to unstack the correct number. I don't know how (or if) functions with variable numbers of arguments are handle in RPN?

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
      I don't know how (or if) functions with variable numbers of arguments are handle in RPN?

      As a previous life's HP-48 owner, number of arguments is pushed onto the stack after them. Since functions know how to unstack their args, in this case this is a two steps operation

      HTH, Dominique
      My two favorites:
      If the only tool you have is a hammer, you will see every problem as a nail. --Abraham Maslow
      Bien faire, et le faire savoir...

        Update: Looks like I spoke too soon:
        sin, a, b
        instead of the correct
        sin, ab
        Hmmm ...

        Actually, I was looking for exactly the solution browserUK provided. Outstanding job, browserUK! It even works for nested function calls like

        "max(min(a,b-c), c, min(d-f,e) ) * atan( pi*4, -1 )"
        which gets transformed to
        a, b, c, -, min, c, d, f, -, e, min, max, pi, 4, *, -1, atan, *
        Now, anyone up for a Parse::Descent grammar, just for kicks :) ?

        It had to be either that, or a sentinel. So, the following expression => RPN would be correct?

        P:\test>423305 "max( a, b, c, d ) * atan( pi*4, -1 )" a, b, c, d, 4, max, pi, 4, *, -1, 2, atan, *

        Modified code

        Examine what is said, not who speaks.
        Silence betokens consent.
        Love the truth but pardon error.