Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Getting the value from XML file

by hellohello1 (Sexton)
on Apr 15, 2014 at 07:48 UTC ( [id://1082317]=perlquestion: print w/replies, xml ) Need Help??

hellohello1 has asked for the wisdom of the Perl Monks concerning the following question:

I have XML files which I need to extract out certain values. So I wrote a code where it simply search for relevant text strings and extract out the values accordingly.

The problem is, how do I print the value of the formal_charge tag (e.g. -1)?

E.g. of my XML file:
<weight>18.998403205</weight> <name>fluoride</name> <smiles>[F-]</smiles> <accession>W00662</accession> -<experimental_properties> -<property> <kind>water_solubility</kind> <value>0.00169 mg/mL at 25 °C</value> <source/> </property> -<predicted_properties> -<property> <kind>formal_charge</kind> <value>-1</value> <source>ChemAxon</source> </property>
Values I want is

weight (18.9984),

name (fluoride),

accession (W00662),

formal_charge (-1)

My code:
sub load_files() { #get a list of all files in directory; ignore all files beginning wi +th a . and other sub directories opendir(my $dh, $dirname) or die "can't opendir $dirname: $!"; my @files = grep (/^[^\.]/ && -f "$dirname/$_", readdir($dh)); #only + keep those not beginning with '.' and are files @files = sort(@files); #sort lexically, 'B' comes before 'a', so tha +t output list is always in same order closedir $dh; my $numfiles = 0; foreach my $file (@files) { #loop through the files $numfiles++; my $accefound = 0; my $namefound = 0; my $monofound = 0; my $chargefound =0; open(my $file_fh, "< $dirname/$file") or die("$$: Error: failed to + open file $dirname/$file. $!\n"); while(<$file_fh>) { #read each line of file if (/(<weight>)(.+)(<\/weight>)/ && !$monofound) { #if first enc +ounter with the tag $monofound = $2; $monofound =~ s/^\s+//; #trim leading whitespace of string $monofound =~ s/\s+$//; #trim trailing whitespace of string } elsif (/(<name>)(.+)(<\/name>)/ && !$namefound) { #if first encoun +ter with the tag $namefound = $2; $namefound =~ s/^\s+//; #trim leading whitespace of string $namefound =~ s/\s+$//; #trim trailing whitespace of string } elsif (/(<accession>)(.+)(<\/accession>)/ && !$accefound) { #if +first encounter with the tag (the tag might not be unique) $accefound = $2; $accefound =~ s/^\s+//; #trim leading whitespace of string $accefound =~ s/\s+$//; #trim trailing whitespace of string } elsif (/(<formal_charge>)(.+)(<\/formal_charge>)/ && !$charge +found) { #if first encounter with the tag $chargefound = $2; $chargefound =~ s/^\s+//; #trim leading whitespace of string $chargefound =~ s/\s+$//; #trim trailing whitespace of string } } print "$monofound\t$namefound\t$accefound\t$chargefound\n"; close($file_fh) or die("$$: Error: failed to close file $dirname/$ +file. $!\n"); } } main();
What I got is:
_OUTPUT DATA_ 18.998403205 fluoride W00662 0
The charge value is not reflecting -1, but it put "0" instead. I know it should match the word "value" , but in this case, there are many "value" tags in the file, so how do I actually match it to this value tag instead of incorrectly match other value tag?
-<property> <kind>formal_charge</kind> <value>-1</value> <source>ChemAxon</source> </property>
I hope there is no need to involve any module and just searching relevant match string is sufficient?

Replies are listed 'Best First'.
Re: Getting the value from XML file
by nikosv (Deacon) on Apr 15, 2014 at 09:05 UTC
    Use Xpath to simplify your life! As you haven't posted the full xml file you are after, I assume that it should look like :
    <root> <predicted_properties> <property> <kind>water_solubility</kind> <value>0.00169 mg/mL at 25 C</value> <source/> </property> </predicted_properties> <predicted_properties> <property> <kind>formal_charge</kind> <value>-1</value> <source>ChemAxon</source> </property> </predicted_properties> </root>

    The code looks for the 'kind' node and then for its sibling node called 'value'

    use XML::Twig::XPath; my $twig= XML::Twig::XPath->new(); $twig->parsefile('myfile.xml'); print $twig->findvalue ('/root/predicted_properties/property/kind[text()="formal_charge"]/ following-sibling::value');

    prints -1

    One suggestion to simplify your xml structure would be to add it as an attribute to kind. ie :

    <kind name='formal_charge'>-1</kind>

Re: Getting the value from XML file
by Discipulus (Canon) on Apr 15, 2014 at 08:08 UTC
Re: Getting the value from XML file
by Anonymous Monk on Apr 15, 2014 at 08:11 UTC
Re: Getting the value from XML file
by hellohello1 (Sexton) on Apr 21, 2014 at 06:00 UTC
    Hey thanks for the replies! Been reading the links provided and I have a small problem here. I install XML::Twig using perl package manager and when I tried adding this code:
    use XML::Twig::XPath;
    I got an error which says:
    Can't locate XML/Twig/XPath.ppm in @INC (@INC contains: lib C:/Strawbe +rry/perl/site/lib C:/Strawberry/perl/vendor/lib C:/Strawberrt/perl/li +b.)
    I tried checking if its installed properly using command prompt..it seems every thing work ok, except that I could not find XML Twig folder in the error specified.. So how do I solve this?

      Stick with plain old XML::Twig

      Also, Can't locate XML/Twig/XPath.ppm looks an awfully lot like nonsense

        "Stick with plain old XML::Twig"

        You need to install any module for that? I assume that I need to install XML::Twig module in ppm in order to use XML::Twig.

        Initially I wrote in

        use XML::Twig
        I also got the same error as mentioned in first post.

Re: Getting the value from XML file
by hellohello1 (Sexton) on Apr 23, 2014 at 09:08 UTC
    Hey thanks! Got it! :)

    I ended up uninstalling another Perl because I still cannot figure out how to change the path as mentioned in Anonymous Monk's reply...(noob me!) and cpan install XML::Twig using cmd.

    Ok. Will start working on it and read all the links first :) I'll be back if I have any problems :)

    Thanks so much guys!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1082317]
Approved by mtmcc
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-25 20:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found