http://www.perlmonks.org?node_id=123063


in reply to Re: Regexps for Parsing Brackets in Chemical Formulae
in thread Regexps for Parsing Brackets in Chemical Formulae

Many thanks to Chipmunk and other folks,

I'll go away and play with these suggestions, which seem quite groovy (insofar as I can tell which ain't that far!) The reason for all of this is sort of related to my final-year project but not actually included in it (the project is in PHP): My tutor wrote a routine to do this kind of thing, which took him ages in some other language and I'm trying to introduce him to the power of Perl (and by extension, Perlmonks.)

In the virtual bar of pm I owe you all a pint.

Elgon

"Without evil there can be no good, so it must be good to be evil sometimes.
--Satan, South Park: Bigger, Longer, Uncut.

  • Comment on Re: Re: Regexps for Parsing Brackets in Chemical Formulae

Replies are listed 'Best First'.
Re: Re: Re: Regexps for Parsing Brackets in Chemical Formulae
by stefp (Vicar) on Nov 04, 2001 at 08:12 UTC
    You were close. That should do it:
    use strict; my %count; # added gratuitous parentheses for embedded formula testing sake. $_='Mo(P(H)3)4(CO)(NH2C2(H)5)'; # at each iteration do subformula with rigtmost left parenthesis. # quit when no more parenthesis s/(.*)\((.*?)\)(\d*)/$1 . $2 x ($3 ? $3 : 1) /e while m/\(/; s/([A-Z](?:[a-z])?)(\d*)/ $count{$1} += $2 ? $2 : 1 ;''/eg; printf "%-2s %3d\n", $_, $count{$_} for sort keys %count;
    It prints:
    C 3 H 19 Mo 1 N 1 O 1 P 4

    -- stefp

      Stefp,

      Muchas gracias - one minor alteration to take account of the fact that certain artificial elements have, under certain nomenclatures, three letters rather than one or two...

      s/([A-Z](?:[a-z]{0,2})?)(\d*)/  $count{$1} += $2 ? $2  : 1 ;''/eg;

      Otherwise, perfect!

      Ta, Elgon.

      "Without evil there can be no good, so it must be good to be evil sometimes.
      --Satan, South Park: Bigger, Longer, Uncut.

        Not quite perfect:

           s/([A-Z][a-z]{0,2})(\d*)/ $count{$1} += $2 ? $2 : 1 '' /eg;

        is cleaner. The (?:) was a unneeded left-off in my code and when you added the {0,2} modifier, the ? modifier became redundant. Or {2}? could be used instead of {0,2}.

        Strangely for the golfers {,2} is not supported; it should be expected to be supported because {2,} is.

        -- stefp