Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Parsing an XML file

by zing (Beadle)
on Aug 24, 2012 at 11:19 UTC ( #989488=perlquestion: print w/ replies, xml ) Need Help??
zing has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. I have this xml file :- http://www.genome.jp/kegg-bin/download?entry=pon01100&format=kgml

I even tried parsing it with XML:Simple as :-

use strict; use XML::Simple; use Data::Dumper; my $xml_hash = XMLin('pon01100.xml'); print Dumper($xml_hash);

Its dumping the output as desired to the console.

$VAR1 = { 'reaction' => { 'rn:R07892' => { 'substrate' => { 'name' => 'cpd:C +16331', 'id' => '3038' }, 'type' => 'reversible', 'id' => '983', 'product' => { 'name' => 'cpd:C16 +332', 'id' => '2201' } }, 'rn:R02687' => { 'substrate' => { 'name' => 'cpd:C +00641', 'id' => '3607' }, 'type' => 'reversible', 'id' => '3606', 'product' => { 'name' => 'cpd:C01 +885', 'id' => '3608' } }, 'rn:R05640' => { 'substrate' => { 'name' => 'cpd:C +01724', 'id' => '2269' }, 'type' => 'reversible', 'id' => '651', 'product' => { 'name' => 'cpd:C11 +455', 'id' => '2270' } }......and so on......
What I want to do is for each of reactions (named rn:....) I want the substrate "id" and its corresponding all product "id". There could be cases when there would be two or more products for a substrate. For each of the reactions ,I want to save "substrate id" and all its corresponding "product id". Please help

Comment on Parsing an XML file
Select or Download Code
Re: Parsing an XML file
by blue_cowdawg (Monsignor) on Aug 24, 2012 at 11:43 UTC
        For each of the reactions ,I want to save "substrate id" and all its corresponding "product id". Please help

    foreach my $name(keys %($xml_hash->{reaction}} ){ next unless name =~ m@^m:@; my $substrate_id = $xml_hash->{reaction}->{$name}->{substrate +}->{id}; my $product_id => $xml_hash->{reaction}->{$name}->{product}-> +{id}; # do stuff here... }


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Parsing an XML file
by aitap (Deacon) on Aug 24, 2012 at 12:16 UTC
    Consider using XML::Twig, which can parse large files without consuming a lot of memory:
    #!/usr/bin/perl use warnings; use strict; use XML::Twig; use feature 'say'; my $parser = XML::Twig::->new( twig_handlers => { '/pathway/reaction[@name=~/^rn:/]' => \&handle, # use XPath to describe what you want to handle } ); $parser->parsefile("myfile"); sub handle { my($twig, $elt) = @_; # handlers are given an XML::Twig object and an XML::Twig::Elt + object say $elt->att("name"); say " substrate[s]: ", map { $_->att("id")." " } $elt->children("s +ubstrate"); say " product[s]: ", map { $_->att("id")." " } $elt->children("pro +duct"); }
    Sorry if my advice was wrong.
Re: Parsing an XML file
by runrig (Abbot) on Aug 24, 2012 at 16:13 UTC
    Example XML, and an example dump of the data structure you want would be helpful. Or do you want to process each 'reaction' as you go and not build the whole data structure? Either way, XML::Rules would likely be of great help here, but I don't have enough information to tell what exactly you want.
Re: Parsing an XML file
by Jenda (Abbot) on Aug 27, 2012 at 17:18 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { substrate => sub { 'substrate' => $_[1]->{id}}, product => sub { '@products' => $_[1]->{id}}, reaction => sub { my %reactions; foreach (split / /, $_[1]->{name}) { $reactions{$_} = { substrate => $_[1]->{substrate}, pr +oducts => $_[1]->{products}}; } return '%reactions' => \%reactions; }, graphics => '', entry => sub { my @reactions = split ' ', (delete $_[1]->{reaction}); $_[1]->{reactions} = \@reactions if @reactions; return '%entries' => {$_[1]->{id} => $_[1]} }, pathway => 'pass' }); use Data::Dumper; print Dumper($parser->parsefile('d:\temp\pon01100.xml'));

    Here's an example of how you could tweak the data structure as it's being read. It gives you a hash containing the name, org, title, image and link from the root tag and a hash of entries and a hash of reactions. The reactions contain only the stuff you said you need, the entries contain the data from the <entry> tag with the list of reactions split and converted into an array.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://989488]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2014-12-21 02:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (101 votes), past polls