http://www.perlmonks.org?node_id=786827

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently learning how to work with xml documents. I've got as far as finding out about XML::Simple and working though some examples. Here is my 'test script' as it is at the moment.
#!/usr/bin/perl # use module use XML::Simple; use Data::Dumper; $xs = new XML::Simple(keeproot => 1,searchpath => ".", forcearray => 1,); $ref = $xs->XMLin("test.xml"); print Dumper($ref);
And my data dump looks like this.
$VAR1 = { 'sas_residue_annotation' => [ { 'xmlns' => 'http://www.ebi.a +c.uk/WSsas/Schema', 'sources' => [ { 'source' => [ { + 'source_name' => [ + '1iho' + ], + 'ref_evalue' => [ + '5.5e-50' + ], + 'ref_overlap' => [ + '282' + ], + 'ref_identity' => [ + '47.20' + ], + 'ref_pmid' => [ + '11377204' + ] } +, { + 'source_name' => [ + '1mop' + ], + 'ref_evalue' => [ + '8.3e-38' + ], + 'ref_overlap' => [ + '264' + ], + 'ref_identity' => [ + '43.60' + ], + 'ref_pmid' => [ + '12717031' + ] } +, { + 'source_name' => [ + '1n2b' + ], + 'ref_evalue' => [ + '8.2e-38' + ], + 'ref_overlap' => [ + '264' + ], + 'ref_identity' => [ + '43.60' + ], + 'ref_pmid' => [ + '12717031' + ] } +, etc ...
I'm not sure how to go about retrieving the information - how I might, for example, just want to print off all the 'source names' in my file. I've tried following a few examples on this today and haven't been able to figure it out yet. Any tips much appreciated.

Replies are listed 'Best First'.
Re: request for help on working with XML::Simple
by spatterson (Pilgrim) on Aug 07, 2009 at 16:55 UTC
    I believe this should work, XML::Simple all hinges on nested hash & array references.
    my $sources = $ref->{'sas_residue_annotation'}->{'sources'}; foreach my $s (@{ $sources }) { print $s->{'source_name'}->[0], "\n"; }

    just another cpan module author
      Thanks for the replies so far. Much appreciated!

      Here is an portion of the xml file as requested.

      <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma WSsas.xsd">> <sources> <source> <source_name>1iho</source_name> <ref_identity>47.20</ref_identity> <ref_overlap>282</ref_overlap> <ref_evalue>5.5e-50</ref_evalue> <ref_pmid>11377204</ref_pmid> </source> <source> <source_name>1mop</source_name> <ref_identity>43.60</ref_identity> <ref_overlap>264</ref_overlap> <ref_evalue>8.3e-38</ref_evalue> <ref_pmid>12717031</ref_pmid> </source> </sources> </sas_residue_annotation>
        Since you are new to XML, allow me to offer a different approach, based on XML::Twig, which, in my opinion, is no more difficult to learn than XML::Simple, and will work for a wider range of XML structures:
        use strict; use warnings; use XML::Twig; my $xmlStr = <<XML; <sas_residue_annotation xmlns="http://url/Schema" xmlns:xsi="http://ww +w.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://url/Sche +ma WSsas.xsd">> <sources> <source> <source_name>1iho</source_name> <ref_identity>47.20</ref_identity> <ref_overlap>282</ref_overlap> <ref_evalue>5.5e-50</ref_evalue> <ref_pmid>11377204</ref_pmid> </source> <source> <source_name>1mop</source_name> <ref_identity>43.60</ref_identity> <ref_overlap>264</ref_overlap> <ref_evalue>8.3e-38</ref_evalue> <ref_pmid>12717031</ref_pmid> </source> </sources> </sas_residue_annotation> XML my $twig= new XML::Twig( twig_handlers => { source_name => \&source_name } ); $twig->parse($xmlStr); exit; sub source_name { my ($twig, $name) = @_; print $name->text(), "\n"; } __END__ 1iho 1mop
      Well .. I'm not getting any errors but I'm not getting any results either for that I'm afraid :(

        I find XML::Simple to be very simple if the XML is very simple, but with non-trivial XML I am often surprised by what it produces and, if I persist with it, end up spending a lot of time re-reading the documentation and studying Data::Dumper dumps.

        Here is an example which shows, step by step as I worked it out, how to access the source_name element...

        use strict; use warnings; use Data::Dumper; my $VAR1 = { 'sas_residue_annotation' => [ { 'xmlns' => 'http://www.ebi.ac.uk/WSsas/Schema', 'sources' => [ { 'source' => [ { 'source_name' => [ '1iho' ], 'ref_evalue' => [ '5.5e-50' ], 'ref_overlap' => [ '282' ], 'ref_identity' => [ '47.20' ], 'ref_pmid' => [ '11377204' ] }, { 'source_name' => [ '1mop' ], 'ref_evalue' => [ '8.3e-38' ], 'ref_overlap' => [ '264' ], 'ref_identity' => [ '43.60' ], 'ref_pmid' => [ '12717031' ] }, { 'source_name' => [ '1n2b' ], 'ref_evalue' => [ '8.2e-38' ], 'ref_overlap' => [ '264' ], 'ref_identity' => [ '43.60' ], 'ref_pmid' => [ '12717031' ] }, ], }, ], }, ], }; print Dumper($VAR1); print Dumper($VAR1->{sas_residue_annotation}); print Dumper($VAR1->{sas_residue_annotation}[0]); print Dumper($VAR1->{sas_residue_annotation}[0]{sources}); print Dumper($VAR1->{sas_residue_annotation}[0]{sources}[0]); print Dumper($VAR1->{sas_residue_annotation}[0]{sources}[0]{source}); foreach my $source (@{$VAR1->{sas_residue_annotation}[0]{sources}[0]{s +ource}}) { print "source name: ", join(',',@{$source->{source_name}}), "\n"; }

        Note that I wrote each successive print after studying the output of the previous one. It is a bit tedious, but for deeply nested structures I find it faster than trying to do it all in my head and then debugging my errors.

Re: request for help on working with XML::Simple
by toolic (Bishop) on Aug 07, 2009 at 16:53 UTC
    Please also post an excerpt of the contents of your 'test.xml' file because it will allow us to easily create example code for you.

    Otherwise, perhaps References quick reference will help.

Re: request for help on working with XML::Simple
by ramrod (Curate) on Aug 07, 2009 at 17:49 UTC
    Don't forget XML::LibXML
    use XML::LibXML; #Parse XML my $template = 'xmldocument.xml'; my $parser = XML::LibXML->new(); my $pdoc = $parser->parse_file($template); #Remember the namespace my $rdoc = XML::LibXML::XPathContext->new($pdoc->documentElement()); $rdoc->registerNs( ns => 'http://url/Schema' ); #Find desired node(s) my @stuff = $rdoc->findnodes("//ns:source_name"); #Print results for(@stuff) { my $data = $_->textContent; print"$data\n"; }
      Thanks for all the suggestions so far. Much appreciated. I've only started looking at xml today so you can imagine how much in the dark I'm been feeling. You guys are fab. Thanks. :D. Any good websites or books out there I should know about to learn more?