Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Re: Regexps for Parsing Brackets in Chemical Formulae

by Elgon (Curate)
on Nov 03, 2001 at 22:20 UTC ( #123063=note: print w/ replies, xml ) Need Help??


in reply to Re: Regexps for Parsing Brackets in Chemical Formulae
in thread Regexps for Parsing Brackets in Chemical Formulae

Many thanks to Chipmunk and other folks,

I'll go away and play with these suggestions, which seem quite groovy (insofar as I can tell which ain't that far!) The reason for all of this is sort of related to my final-year project but not actually included in it (the project is in PHP): My tutor wrote a routine to do this kind of thing, which took him ages in some other language and I'm trying to introduce him to the power of Perl (and by extension, Perlmonks.)

In the virtual bar of pm I owe you all a pint.

Elgon

"Without evil there can be no good, so it must be good to be evil sometimes.
--Satan, South Park: Bigger, Longer, Uncut.


Comment on Re: Re: Regexps for Parsing Brackets in Chemical Formulae
Re: Re: Re: Regexps for Parsing Brackets in Chemical Formulae
by stefp (Vicar) on Nov 04, 2001 at 08:12 UTC
    You were close. That should do it:
    use strict; my %count; # added gratuitous parentheses for embedded formula testing sake. $_='Mo(P(H)3)4(CO)(NH2C2(H)5)'; # at each iteration do subformula with rigtmost left parenthesis. # quit when no more parenthesis s/(.*)\((.*?)\)(\d*)/$1 . $2 x ($3 ? $3 : 1) /e while m/\(/; s/([A-Z](?:[a-z])?)(\d*)/ $count{$1} += $2 ? $2 : 1 ;''/eg; printf "%-2s %3d\n", $_, $count{$_} for sort keys %count;
    It prints:
    C 3 H 19 Mo 1 N 1 O 1 P 4

    -- stefp

      Stefp,

      Muchas gracias - one minor alteration to take account of the fact that certain artificial elements have, under certain nomenclatures, three letters rather than one or two...

      s/([A-Z](?:[a-z]{0,2})?)(\d*)/  $count{$1} += $2 ? $2  : 1 ;''/eg;

      Otherwise, perfect!

      Ta, Elgon.

      "Without evil there can be no good, so it must be good to be evil sometimes.
      --Satan, South Park: Bigger, Longer, Uncut.

        Not quite perfect:

           s/([A-Z][a-z]{0,2})(\d*)/ $count{$1} += $2 ? $2 : 1 '' /eg;

        is cleaner. The (?:) was a unneeded left-off in my code and when you added the {0,2} modifier, the ? modifier became redundant. Or {2}? could be used instead of {0,2}.

        Strangely for the golfers {,2} is not supported; it should be expected to be supported because {2,} is.

        -- stefp

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://123063]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-12-25 19:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (162 votes), past polls