saintmike has asked for the
wisdom of the Perl Monks concerning the following question:
Esteemed Monks,
does anybody know of a module solving the classic compiler class task of translating an arithmetic expression into RPN (reverse polish notation)? Basically, I have something like
"2*(somevar+other) + max(this, that)"
and want it to be transformed into
("2", "somevar", "other", "ADD", "MULT", "this", "that", "MAX", "ADD")
There's a couple of related modules on CPAN, like Parse::RPN or Math::RPN, but neither one does the trick.
Any pointers to code snippets would be great, before I roll my own.
Re: Generic RPN Translator available? by Anonymous Monk on Jan 19, 2005 at 09:39 UTC 
Well, I'd use Parse::RecDescent, but that still requires you to write a grammar, and some rewrite rules.  [reply] 
Re: Generic RPN Translator available? by Jasper (Chaplain) on Jan 19, 2005 at 10:02 UTC 
http://perlgolf.sourceforge.net/TPR/0/5a/
You'll find plenty of code samples here ;)  [reply] 
Re: Generic RPN Translator available? by stvn (Monsignor) on Jan 19, 2005 at 14:48 UTC 
Any pointers to code snippets would be great, before I roll my own.
The simplest way I know to convert Infix to RPN is to use stacks. You will need two stacks, one for operators, and the other a container for your RPN formated expression. Then the algorithm basically goes something like this:
my $token_count = 0;
while (@tokens) {
my $token = shift @tokens;
# if its an opening paran then ...
if ($token eq '(') {
push(@operator_stack, $token);
}
# if it's an operator then ...
elsif (is_operator($token)) {
push(@operator_stack, $token);
}
# if its a closing paran then ...
elsif ($token eq ')') {
# empty the operator stack until
# we find the opening paren
while (@operator_stack && $operator_stack[1] ne '(') {
# and push everything onto the
# operator stack
push @program_stack, (pop @operator_stack);
}
# then pop the opening paren off
pop @operator_stack;
}
# if its not an operator, or a paran ...
else {
push @program_stack, $token;
$token_count++;
# if we have 2 or more tokens on the
# program stack (since out last operator)
# we need to get an operator ....
if ($token_count >= 2){
# unless the top of our operator stack
# is a opening paran, in which case
# we have a subexpression which needs
# to be handled first
next if $operator_stack[1] eq '(';
# otherwise grab the operator, and ...
push @program_stack, (pop @operator_stack);
# reset the token count
$token_count = 1;
}
}
}
# and lastly, empty whats left of your
# operator stack onto your program stack
push @program_stack, (pop @operator_stack) while (@operator_stack);
I got this algorithm from a old book on FORTH, and they had this table in it, which I found helpful in visually explaining the algorithm.
source string  operator stack  program stack
++
A+(B*CD)/E  
+(B*CD)/E   A
(B*CD)/E  +  A
B*CD)/E  +(  A
*CD)/E  +(  AB
CD)/E  +(*  AB
D)/E  +(*  ABC
D)/E  +(  ABC*
)/E  +(  ABC*D
/E  +  ABC*D
E  +/  ABC*D
 +/  ABC*DE
  ABC*DE/+
Of course, this code and algorithm somewhat relies on properly nested parens, and will not handle the 'max()' part of your expression, but it's something to start with.
The more complex way to do it (and both more flexible and robust) is to build a parse tree out of your expression, then you can get infix, postfix, prefix, whateverfix order you want just by traversing the tree in different ways. I actually show examples of this conversionthroughtraversal in the docs for my module Tree::Binary (more specifically the examples are in Tree::Binary::Visitor::InOrderTraversal, Tree::Binary::Visitor::PreOrderTraversal and Tree::Binary::Visitor::PostOrderTraversal in their SYNOPSIS sections).
The algorithm for building a parse tree is not entirely unlike the stack version above, except its more complex and you are not building a program stack, but a parse tree. The operator stack is still a very useful tool in helping you decide to go up or down the tree. Unfortunately I don't have any example code for this, and besides, it is very dependent upon how you structure your tree anyway. I am sure though that a quick google search on building parse trees will provide you with some insight.
 [reply] [d/l] 
Re: Generic RPN Translator available? by BrowserUk (Pope) on Jan 20, 2005 at 09:34 UTC 
This might form a useful starting point. It seems reasonably accurate, flexible and efficient.
Update: Improved error handling a little and added XLATE option.
Update2: Apparently there is something wrong with this code. Would someone care to tell me what that is?
#! perl slw
use strict;
use List::Util qw[ reduce ]; $a=$b;
our $XLATE = 0;
sub nestedOk{
index( $_[ 0 ], '(' ) <= index( $_[ 0 ], ')' ) and
0 == reduce{
$a + ( $b eq '(' )  ( $b eq ')' )
} 0, split'[^()]*', $_[ 0 ]
}
my $re_var = qr[ [azAZ]\w* ]x;
my $re_subex = qr[ \{\d+\} ]x;
my $re_func = qr[ $re_var $re_subex ]x;
my $re_num = qr[ ? \d+ (?: \. \d+ )? (?: [Ee] [+]? \d+ )? ]x;
my $re_term = qr[ $re_func  $re_subex  $re_var  $re_num ]x;
my $re_op = qr[[,%+*/^]];
my %ops = ( qw[ % MOD + ADD * MULT / DIV ^ POW  SUBT ] );
sub exp2rpn {
my( $exp, $aStack, $aBits ) = @_;
die "Unbalanced parens: '$exp'" unless nestedOk $exp;
{
my( $left, $op, $right, $rest ) = $exp =~ m[
^ ( $re_term )? ( $re_op )? ( $re_term ) ( .* ) $
]x or die "malformed (sub)expression '$exp'";
for ( $left, $right ) {
next unless $_;
if( my( $func, $subex ) = m[^ ( $re_var )? \{ ( \d+ ) \} $
+]x ) {
exp2rpn( $aBits>[ $subex ], $aStack, $aBits );
push @$aStack, $func if $func
}
else{
push( @$aStack, $_ );
}
}
push @$aStack, $XLATE ? $ops{ $op } : $op
if $op and $op ne ',';
$exp = $rest, redo if $rest;
}
return $aStack;
}
sub parseExp {
local $_ = $_[ 0 ];
s[\s+][]g;
my( $bit, @bits ) = 0;
s[\( ( [^()]+ ) \)]{ push @bits, $1; "{${ \( $bit++ ) }}"; }ex
while m[[()]];
my $toplvl = $_;
return @{ exp2rpn $toplvl, [], \@bits };
}
die "No expression given\n" unless @ARGV;
print join', ', parseExp $ARGV[ 0 ];
__END__
P:\test>4233052 "2*(somevar+other) + max(this, that)"
2, somevar, other, +, *, this, that, max, +
P:\test>4233052 "5^((2+x)*sin(p+4)/fred)"
5, 2, x, +, p, 4, +, sin, *, fred, /, ^
P:\test>4233052 "A+(B*CD)/E"
A, B, C, *, D, , +, E, /
P:\test>4233052 "max( a, b, c, d ) * atan( pi*4, 1 )"
a, b, c, d, max, pi, 4, *, 1, atan, *
P:\test>423305 XLATE "5^((2+x)*sin(p+4)/fred)"
5, 2, x, ADD, p, 4, ADD, sin, MULT, fred, DIV, POW
You may need to tweak the named regex at the top to match your definition of variable names, function names and acceptable numeric forms.
Functions are assumed to have fixed numbers of arguments and know how to unstack the correct number. I don't know how (or if) functions with variable numbers of arguments are handle in RPN?
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
 [reply] [d/l] 

 [reply] 

P:\test>423305 "max( a, b, c, d ) * atan( pi*4, 1 )"
a, b, c, d, 4, max, pi, 4, *, 1, 2, atan, *
Modified code
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
 [reply] [d/l] [select] 


Update: Looks like I spoke too soon:
"sin(ab)"
yields
sin, a, b
instead of the correct
sin, ab
Hmmm ...
Actually, I was looking for exactly the solution browserUK provided. Outstanding job, browserUK! It even works for nested function calls like
"max(min(a,bc), c, min(df,e) ) * atan( pi*4, 1 )"
which gets transformed to
a, b, c, , min, c, d, f, , e, min, max, pi, 4, *, 1, atan, *
Now, anyone up for a Parse::Descent grammar, just for kicks :) ?
 [reply] [d/l] [select] 




