Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: Regular Expressions and atomic weights

by polypompholyx (Chaplain)
on Jul 25, 2005 at 19:31 UTC ( #477947=note: print w/replies, xml ) Need Help??

in reply to Regular Expressions and atomic weights

I wrote a calculator module that does exactly this for chemical formula strings. It's my pet wheel-reinvention, but the RMM thing has actually been very useful (I'm a biochemistry lecturer). I would post the code, but it's a bit huge: just look in the module in the tarball. It's actually an extension to a more general calculator thing, but you'll probably find the Parse::RecDescent grammar useful: as other posters have said, a regex cannot parse general chemical formulae, because they are inherently nested (it's the same reason regexes can't be used to parse HTML in anything but the ugliest hacks). Some general things to consider are:
  • Do you need the grammar to understand complicated things like Fe2(SO4)3.9H2O? If this answer to this is "yes", you need a Parse::RecDescent-style (context-free) grammar: regexes will not work.
  • Does it need to understand common shorthands like Et, Me, Ph and Ac?
  • Does it need to understand H, T, D and the hideous nomeclatural mess of the transactinides?
You may find it easiest to think of the formulae as objects: each chemical element is a tiny hash-based object, so parsing 'H' would return something along the lines of bless { 'H' => 1 }, $class. You can then think of CuSO4 literally as Cu + S + 4*O, and use overloaded add and multiply method calls on the objects. My code does something gnarly to generate a sort of assembler for the world's slowest virtual machine: I wouldn't recommend cutting-and-pasting it! Calculating the RMM is then a simple matter of walking through the object's innards with a while (my ($elem, $count) = each %$self ) loop and using a %rmm hash of $element => $rmm pairs. Hope this helps.

Replies are listed 'Best First'.
Re^2: Regular Expressions and atomic weights
by ikegami (Pope) on Jul 26, 2005 at 00:19 UTC

    For fun, a regexp solution. It would have been much simpler if $compound didn't require an accumulator and wasn't reentrant. (Either is ok. Both makes a mess.) That's the reason behind the whole symtab business.

    What follows is a simpler solution **that doesn't work**. It prints "The weight of Pb(CO3)2 is 384." (instead of 327) because $rv_group gets clobbered.

      Thanks everyone, I've gained a lot of wisdom about this sort of subject and a solution to my current problem.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://477947]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (14)
As of 2017-02-20 16:07 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (300 votes). Check out past polls.