Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Seeking Advice: Writing a parser

by Flame (Deacon)
on Feb 08, 2002 at 04:27 UTC ( #144030=perlquestion: print w/replies, xml ) Need Help??
Flame has asked for the wisdom of the Perl Monks concerning the following question:

Well, bored out of my mind earlier today, I decided that I wanted to reach into the realm of math for a while. Thus was born the concept behind Math::ParseEquasion (Temp-name until I can come up with something better)

Now that I have the idea though, I'm stuck with a few serious problems. Most significant is that I have no clue how to write a parser. So to start with, can anyone suggest any particularly good parsing modules I should examine?

Now, basically this is the way I'm aiming to have it work:

my $equasion = new Math::Parser('tan(x^4)+2sin(x)');

That little bit should have parsed it into a tree resembling this:
-'tan' |-'x' |-'^' |-'4' -'+' -'2' -'*' -'sin' |-'x'

Hopefully that explained my goals well enough.

To wrap up, I ask that if anyone has any advice or suggestions, please post them. Although I did a search for something similar to this, and didn't find it, that doesn't mean it's not there, so if someone has already written something to do this, please say so.


"Weird things happen, get used to it."

Flame ~ Lead Programmer: GMS

Replies are listed 'Best First'.
Re: Seeking Advice: Writing a parser
by Zaxo (Archbishop) on Feb 08, 2002 at 05:14 UTC

    Math::Expr does exactly what you want. Release date is 2001-10-01, so it's fairly up to date.

             Math::Expr - Parses mathematical expressions
    SYNOPSIS use Math::Expr; SetOppDB(new Math::Expr::OpperationDB('<DBFileName>')); $e=Parse("a+4*b-d/log(s)+f(d,e)"); DESCRIPTION Parses mathematical expressions into a tree structure. The expressions may contain integers, real numbers, alphanumeric variable names, alphanumeric function names and most other characters might be used as operators...

    From the pod. That's no reason not to write your own as an exercise :)

    After Compline,

Re: Seeking Advice: Writing a parser
by jryan (Vicar) on Feb 08, 2002 at 05:07 UTC

    Something like this needs a recursive parser. Aside from suggesting Parse::RecDescent or the example in Japhy's book, I'll give a few suggestions on how I would approach this.

    First off, you are correct in storing it a tree-type data structure. However, don't use a normal hash; I recommend Tie::IxHash to keep an ordered hash since you will need to reorder later.

    Next, you will need to figure out a way of determining whether a term is top level, and separate those terms from each other. Simply splitting on /[+-]/ wont be enough; you'll need to inchworm through in order and determine context. Next, you'll have to separate those terms which are /[*\/]/, and mark those as separate parts of one term. Finally, you'll need to deal with leftovers, such as exponentiation, function calls, etc.

    Once you have the top level terms in the tree, you'll have descend into each node and repeat the process (here is where the recursion comes in.) You'll need to do it again, and again, and again, until everything is parsed to its lowest level. Finally, you'll have to work backwards from the bottom to perform the calculations.

    Not as easy as you thought, was it? That's why there is CPAN :)

Re: Seeking Advice: Writing a parser
by BrentDax (Hermit) on Feb 08, 2002 at 09:18 UTC
    I highly recommend you use Parse::RecDescent. It has an <autotree> directive which, although it doesn't work exactly as you describe, builds a tree that's probably better for your purposes:
    + / \ / \ / \ tan * | / \ ^ 2 sin / \ | x 4 x
    The advantage of this tree is that you can just do a postorder traversal with a number stack to run it. Of course, if you really like the tree design you describe above, you can also just provide the tree-building actions yourself instead of using <autotree>.

    --Brent Dax
    There is no sig.

Re: Seeking Advice: Writing a parser
by Dominus (Parson) on Feb 08, 2002 at 22:52 UTC
    One problem you will need to think about that I think nobody else has pointed out yet is that mathematics, as conventionally written, is ambiguous. For example, if you get an expression like b(x+1), it may not be clear whether the intent is to multiply the quantity b by x+1 or to give x+1 as an argument to the function b. Mathematicians use two things to determine which case holds: One is information from the surrounding mathematical context; if b has been previously described as being a function, then you know the second interpretation is correct. Making use of contextual details is tricky in a parser. The other thing the mathematicians look at is subtle typographical distinctions, which aren't available to you here.

    Computer languages get around this by requiring explicit multiplication symbols: The expression above becomes b*(x+1) if multiplication is intended. Your example above indicates that you want implicit multiplication, and that can be very tricky. If you see something like y = mx, how will you know how to parse it? Is mx one variable, or is it the product of m and x? These are the sorts of side problems that you'll have to solve to build a parser that works the way you want it to.

    You might want to consider first writing a parser for a simpler and less ambiguous language---say, arithmetic expressions involving only numerals, with explicit multiplication. Once you have some experience solving the simple problem, you can go back and embellish it to handle more complex expressions.

    Mark Dominus
    Perl Paraphernalia

      Just to clarify your first point; for the benefit of those w/o an education from the American public schools ;-) it's PEMDAS!

      1. Parantheses (I would imagine trig functions go here)
      2. Exponents (I would imagine logarythms go here)
      3. Multiplication, Division
      4. Addition, Subtraction

      For the second part you can be smart. Declare your parser as not lazy, then if mx has not yet been seen but m and x have there is implicit multiplication. On the gripping hand say m, x, and mx are defined. I'd opt for mx and force explicit multiplcation for m and x.

      perl -pe "s/\b;([st])/'\1/mg"

    Re: Seeking Advice: Writing a parser
    by jackdied (Monk) on Feb 08, 2002 at 05:09 UTC
      Update : jryan hit submit two seconds before I did, and with a more verbose post. read that Parse::RecDescent is an easy to use little parser. It includes examples of describing small grammars. If that isn't enough, do a google search on 'BNF Grammars' and check out the university comp sci pages.
    Re: Seeking Advice: Writing a parser
    by Flame (Deacon) on Feb 08, 2002 at 04:29 UTC
      Note: This module is being written to lead into another module I'm considering, Math::Solver.

      "Weird things happen, get used to it."

      Flame ~ Lead Programmer: GMS

    Re: Seeking Advice: Writing a parser
    by mattr (Curate) on Feb 08, 2002 at 15:30 UTC
      Well I can't access Math::Expr through so full throttle ahead! But please call it Math::ParseEquation instead.

      Also I think I might have a different preference for the way you order things in your tree. That said, perhaps the problem could be redefined by designing a converter from the above infix notation to Reverse-Polish (postfix) Notation? Some linx: <a href=">1 2 3

      Move SIG!

        Math::Expr at Maybe the site was down, or you typed it wrong.
        Also I did
        perl -MCPAN -e 'install Math::Expr'
        and it found it and installed it...
        Snazzy tagline here
          This is wierd. I have been totally unable to reach any links to modules inside, using different computers, for the past week. People are posting links I can't see, and I can't search through their search form.

          I wonder if it is something to do with a configuration they did and me not being in the right hemisphere?

          I get:

          Forbidden You don't have permission to access /search on this server. ---------------------------------------------------------------------- +---------- Apache/1.3.22 Server at Port 80
        Math::ParseEquation was the name I originally planned to use, since no one has suggested I change it, I guess I'm sticking with it.

        "Weird things happen, get used to it."

        Flame ~ Lead Programmer: GMS

    Log In?

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: perlquestion [id://144030]
    Approved by root
    and the rats come out to play...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (4)
    As of 2017-02-27 00:54 GMT
    Find Nodes?
      Voting Booth?
      Before electricity was invented, what was the Electric Eel called?

      Results (376 votes). Check out past polls.