Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Parsing an XML file

by zing (Beadle)
on Aug 24, 2012 at 11:19 UTC ( #989488=perlquestion: print w/replies, xml ) Need Help??
zing has asked for the wisdom of the Perl Monks concerning the following question:

Hi all. I have this xml file :-

I even tried parsing it with XML:Simple as :-

use strict; use XML::Simple; use Data::Dumper; my $xml_hash = XMLin('pon01100.xml'); print Dumper($xml_hash);

Its dumping the output as desired to the console.

$VAR1 = { 'reaction' => { 'rn:R07892' => { 'substrate' => { 'name' => 'cpd:C +16331', 'id' => '3038' }, 'type' => 'reversible', 'id' => '983', 'product' => { 'name' => 'cpd:C16 +332', 'id' => '2201' } }, 'rn:R02687' => { 'substrate' => { 'name' => 'cpd:C +00641', 'id' => '3607' }, 'type' => 'reversible', 'id' => '3606', 'product' => { 'name' => 'cpd:C01 +885', 'id' => '3608' } }, 'rn:R05640' => { 'substrate' => { 'name' => 'cpd:C +01724', 'id' => '2269' }, 'type' => 'reversible', 'id' => '651', 'product' => { 'name' => 'cpd:C11 +455', 'id' => '2270' } }......and so on......
What I want to do is for each of reactions (named rn:....) I want the substrate "id" and its corresponding all product "id". There could be cases when there would be two or more products for a substrate. For each of the reactions ,I want to save "substrate id" and all its corresponding "product id". Please help

Replies are listed 'Best First'.
Re: Parsing an XML file
by aitap (Curate) on Aug 24, 2012 at 12:16 UTC
    Consider using XML::Twig, which can parse large files without consuming a lot of memory:
    #!/usr/bin/perl use warnings; use strict; use XML::Twig; use feature 'say'; my $parser = XML::Twig::->new( twig_handlers => { '/pathway/reaction[@name=~/^rn:/]' => \&handle, # use XPath to describe what you want to handle } ); $parser->parsefile("myfile"); sub handle { my($twig, $elt) = @_; # handlers are given an XML::Twig object and an XML::Twig::Elt + object say $elt->att("name"); say " substrate[s]: ", map { $_->att("id")." " } $elt->children("s +ubstrate"); say " product[s]: ", map { $_->att("id")." " } $elt->children("pro +duct"); }
    Sorry if my advice was wrong.
Re: Parsing an XML file
by blue_cowdawg (Monsignor) on Aug 24, 2012 at 11:43 UTC
        For each of the reactions ,I want to save "substrate id" and all its corresponding "product id". Please help

    foreach my $name(keys %($xml_hash->{reaction}} ){ next unless name =~ m@^m:@; my $substrate_id = $xml_hash->{reaction}->{$name}->{substrate +}->{id}; my $product_id => $xml_hash->{reaction}->{$name}->{product}-> +{id}; # do stuff here... }

    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Parsing an XML file
by Jenda (Abbot) on Aug 27, 2012 at 17:18 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new( stripspaces => 7, rules => { substrate => sub { 'substrate' => $_[1]->{id}}, product => sub { '@products' => $_[1]->{id}}, reaction => sub { my %reactions; foreach (split / /, $_[1]->{name}) { $reactions{$_} = { substrate => $_[1]->{substrate}, pr +oducts => $_[1]->{products}}; } return '%reactions' => \%reactions; }, graphics => '', entry => sub { my @reactions = split ' ', (delete $_[1]->{reaction}); $_[1]->{reactions} = \@reactions if @reactions; return '%entries' => {$_[1]->{id} => $_[1]} }, pathway => 'pass' }); use Data::Dumper; print Dumper($parser->parsefile('d:\temp\pon01100.xml'));

    Here's an example of how you could tweak the data structure as it's being read. It gives you a hash containing the name, org, title, image and link from the root tag and a hash of entries and a hash of reactions. The reactions contain only the stuff you said you need, the entries contain the data from the <entry> tag with the list of reactions split and converted into an array.

    Enoch was right!
    Enjoy the last years of Rome.

Re: Parsing an XML file
by runrig (Abbot) on Aug 24, 2012 at 16:13 UTC
    Example XML, and an example dump of the data structure you want would be helpful. Or do you want to process each 'reaction' as you go and not build the whole data structure? Either way, XML::Rules would likely be of great help here, but I don't have enough information to tell what exactly you want.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://989488]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2018-11-19 10:06 GMT
Find Nodes?
    Voting Booth?
    My code is most likely broken because:

    Results (210 votes). Check out past polls.