Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

How to avoid an alphabet and integer next to it in a string?

by piscean (Acolyte)
on Mar 21, 2014 at 17:48 UTC ( #1079310=perlquestion: print w/ replies, xml ) Need Help??
piscean has asked for the wisdom of the Perl Monks concerning the following question:

Here's a newbie, trying to write a code that calculates molecular weight bypassing the number of Hydrogen atoms. The input is a chemical molecular formula like C6H5OH. I want to avoid calculating the H and the integer next to it. I tried doing that, but I failed. Anymore inputs needed to answer the question, let me know. Please help. Thanks in advance.

Comment on How to avoid an alphabet and integer next to it in a string?
Re: How to avoid an alphabet and integer next to it in a string?
by hippo (Curate) on Mar 21, 2014 at 17:53 UTC
    I tried doing that, but I failed.

    What did you try? How did it fail?

    Hard to tell, but is all you are looking for this?

    my $formula = 'C6H5OH'; $formula =~ s/H\d//g; print "$formula\n";

    Update: As runrig suggests below the more general s/H\d*//g; may be more appropriate to your needs.

      Yes, this is what I was looking for. Thanks! I tried avoiding H in C6H9. It turned out to calculate C69 giving me a wrong result. Of course, I was foolish enough to try this.
        my $molform = <STDIN>; $molform =~ s/[^a-zA-G0-9]//g; my $molmass = new Chemistry::MolecularMass; my $mass = $molmass->calc_mass("$molform");
      Oops! This is giving me wrong output too. Hope the code I posted below gives an idea of what I wanted.
        What about this?
        $formula =~ s/H\d*//g;
      As runrig suggests below the more general s/H\d*//g; may be more appropriate to your needs.

      It does help, but what about Hg element? When I enter Hg, it gives me zero as output.

Re: How to avoid an alphabet and integer next to it in a string?
by kcott (Abbot) on Mar 21, 2014 at 20:01 UTC

    G'day piscean,

    Welcome to the monastery.

    You can do that like this. The tests include one- and two-letter symbols with and without numbers. I'll leave you to replace my rough atomic weights with more precise ones.

    #!/usr/bin/env perl -l use strict; use warnings; my %weight = (C => 12, O => 16, Cl => 35.5); my %tests = ( phenol => ['C6H5OH', 6 * 12 + 1 * 16], chloroform => ['CHCl3', 1 * 12 + 3 * 35.5], ); for my $compound (keys %tests) { print '-' x 40; print "Compound: $compound"; my $formula = $tests{$compound}[0]; print "Formula: $formula"; my $calculated = 0; $formula =~ s{([A-Z][a-z]?)(\d*)}{ exists $weight{$1} and $calculated += $weight{$1} * ($2 || 1) }eg; print "Expected: $tests{$compound}[1]"; print "Calculated: $calculated"; }

    Output:

    ---------------------------------------- Compound: phenol Formula: C6H5OH Expected: 88 Calculated: 88 ---------------------------------------- Compound: chloroform Formula: CHCl3 Expected: 118.5 Calculated: 118.5

    -- Ken

      Thanks Ken! It was really helpful :) I've used Chemistry::MolecularMass module to have precise atomic weights.

        I haven't used Chemistry::MolecularMass previously (in fact, I wasn't aware of its existence until now); however, looking at its documentation, it would appear another (completely untested) solution would be:

        use Chemistry::MolecularMass; my $mm = Chemistry::MolecularMass::->new(); $mm->replace_elements(H => 0); my $no_H_mass = $mm->calc_mass($your_formula);

        -- Ken

Re: How to avoid an alphabet and integer next to it in a string?
by AnomalousMonk (Abbot) on Mar 21, 2014 at 20:09 UTC

    Maybe something like:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $Hn = qr{ H (?! [[:lower:]]) \d* }xms; my $not_Hn = qr{ (?! $Hn) }xms; ;; use constant FORMULA => 'HC6H5OHHg2HeBr3H'; ;; my $s = FORMULA; print qq{'$s'}; $s =~ s{ $Hn }''xmsg; print qq{'$s'}; ;; $s = FORMULA; my @elements = $s =~ m{ $not_Hn [[:upper:]] [[:lower:]]? \d* }xmsg; printf qq{'$_' } for @elements; " 'HC6H5OHHg2HeBr3H' 'C6OHg2HeBr3' 'C6' 'O' 'Hg2' 'He' 'Br3'
      Thanks! This helped too :)
Re: How to avoid an alphabet and integer next to it in a string?
by VincentK (Beadle) on Mar 21, 2014 at 21:01 UTC
    Hi piscean.

    I don't know much about chemistry, but I think I found a library that will help you.

    Chemistry::File::Formula - Molecular formula reader/formatter

    http://search.cpan.org/~itub/Chemistry-Mol-0.37/File/Formula.pm

    I plugged the library into a basic script and it seems to parse the elements out in the way you need.

    From here you should be able to parse each key/value in the formula hash and base your calcuation on the elements you need.

    I hope this helps.

    use strict; use warnings; use Data::Dumper; use Chemistry::File::Formula; while(<DATA>) { chomp; my %formula = Chemistry::File::Formula->parse_formula("$_"); print "-" x 16, "\n"; print Dumper \%formula; print "-" x 16, "\n"; } __DATA__ C6H5OH C6H9 Hg

    Output:
    C:\monks\calc_mass>perl calc_molecularmass.pl ---------------- $VAR1 = { 'H' => 6, 'O' => 1, 'C' => 6 }; ---------------- ---------------- $VAR1 = { 'H' => 9, 'C' => 6 }; ---------------- ---------------- $VAR1 = { 'Hg' => 1 }; ---------------- C:\monks\calc_mass>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1079310]
Approved by moritz
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-09-20 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (158 votes), past polls