Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Parsing XML

by bayareamonk (Initiate)
on Oct 29, 2013 at 17:27 UTC ( [id://1060188]=perlquestion: print w/replies, xml ) Need Help??

bayareamonk has asked for the wisdom of the Perl Monks concerning the following question:

I'm having an issue parsing this XML file using either XML::LibXML or XML::Simple
<?xml version="1.0" encoding="utf-8"?> <GfmDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml +ns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="urn:us: +org:my:gfmdi:v35GfmDocument353.xsd" d1p1:DESVersion="5" d1p1:resource +Element="true" d1p1:createDate="2013-10-08" d1p1:classification="U" d1p1:ownerProduce +r="USA" xmlns:d1p1="urn:us:org:ic:ism" xmlns="urn:us:org:my:gfmdi:v35 +"> <GfmHeader> <TopicName>gfm_di.org_tree.usa</TopicName> <TopicMessageType>Baseline</TopicMessageType> <TopicMessageID>2754</TopicMessageID> <AsOfIncrementalMessageID>2754</AsOfIncrementalMessageID> </GfmHeader> <GFMIEDM35> <OBJ_ITEM_OO_TBL> <OBJ_ITEM d1p1:classification="U" d1p1:ownerProducer="USA"> <OBJ_ITEM_ID>72060793789255493</OBJ_ITEM_ID> <CAT_CODE>OR</CAT_CODE> <NAME_TXT>Oranization 10</NAME_TXT> <GFM_OBJ_ITEM_S_DTG>2008-10-01T00:00:00Z</GFM_OBJ_ITEM_S_DTG> <GFM_OBJ_ITEM_T_DTG>2999-12-01T00:00:00Z</GFM_OBJ_ITEM_T_DTG> <ORG> <ORG_ID>72060793789255493</ORG_ID> <CAT_CODE>UN</CAT_CODE> <GFM_CAT_CODE>NOS</GFM_CAT_CODE> </ORG> </OBJ_ITEM> <OBJ_ITEM d1p1:classification="U" d1p1:ownerProducer="USA"> <OBJ_ITEM_ID>72060793789255508</OBJ_ITEM_ID> <CAT_CODE>OR</CAT_CODE> <NAME_TXT>Organization 25</NAME_TXT> <GFM_OBJ_ITEM_S_DTG>2008-10-01T00:00:00Z</GFM_OBJ_ITEM_S_DTG> <GFM_OBJ_ITEM_T_DTG>2999-12-01T00:00:00Z</GFM_OBJ_ITEM_T_DTG> <ORG> <ORG_ID>72060793789255508</ORG_ID> <CAT_CODE>UN</CAT_CODE> <GFM_CAT_CODE>NOS</GFM_CAT_CODE> </ORG> </OBJ_ITEM> <OBJ_ITEM d1p1:classification="U" d1p1:ownerProducer="USA"> <OBJ_ITEM_ID>72060793789255510</OBJ_ITEM_ID> <CAT_CODE>OR</CAT_CODE> <NAME_TXT>Organization 50</NAME_TXT> <GFM_OBJ_ITEM_S_DTG>2008-10-01T00:00:00Z</GFM_OBJ_ITEM_S_DTG> <GFM_OBJ_ITEM_T_DTG>2999-12-01T00:00:00Z</GFM_OBJ_ITEM_T_DTG> <ORG> <ORG_ID>72060793789255510</ORG_ID> <CAT_CODE>UN</CAT_CODE> <GFM_CAT_CODE>NOS</GFM_CAT_CODE> </ORG> </OBJ_ITEM> </OBJ_ITEM_OO_TBL> </GFMIEDM35> </GfmDocument>
Here is code I'm using...no errors, but does not return anything.
#!/usr/bin/perl use strict; use warnings; use XML::LibXML; my $filename = "aos_list.xml"; my $parser = XML::LibXML->new(); my $xmldoc = $parser->parse_file($filename); # tried different Xpath - still returns nothing for my $sample ($xmldoc->findnodes('/GfmDocument/GFMIEDM35/OBJ_ITEM_OO +_TBL/OBJ_ITEM')) { for my $property ($sample->findnodes('./*')) { print $property->nodeName(), ": ", $property->textContent(), " +\n"; } print "\n"; }
I'm trying to return: TopicMessageID OBJ_ITEM_ID CAT_CODE NAME_TXT ORG_ID CAT_CODE (from ORG) GFM_CAT_CODE Any suggestions?

Replies are listed 'Best First'.
Re: Parsing XML
by runrig (Abbot) on Oct 29, 2013 at 17:55 UTC
    ... xmlns="urn:us:org:my:gfmdi:v35">
    Your document does not exist in the null namespace. You'll need to specify it in your XPath, using XML::LibXML::XPathContext. E.g.:
    my $parser = XML::LibXML->new(); my $xmldoc = $parser->load_xml( location => $file, ); my $xc = XML::LibXML::XPathContext->new($xmldoc); $xc->registerNs('gfm', 'urn:us:org:my:gfmdi:v35'); for my $sample ($xc->findnodes('/gfm:GfmDocument/gfm:GFMIEDM35/gfm:OBJ +_ITEM_OO_TBL/gfm:OBJ_ITEM')) {
Re: Parsing XML with a Twig
by Discipulus (Canon) on Oct 30, 2013 at 09:14 UTC
    Hello, better you forget XML::Simple, may be you'll instead find useful XML::Twig
    In the TIMTOWTDI spirit i present a simple XML::Twig solution:

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; my $twig=XML::Twig->new( pretty_print => 'indented', twig_handlers => { 'GfmDocument/GfmHeader/TopicMessageID ' => s +ub{print $_[1]->text,"\n";}, '/GfmDocument/GFMIEDM35/OBJ_ITEM_OO_TBL/OBJ_ +ITEM' => sub{ my $twig = shift; my $obj =shift; print map { "\t".$obj->first_ch +ild_text($_)."\n" } qw(OBJ_ITEM_ID CAT_CODE NAME_TXT); my $org = $obj->first_child('OR +G'); print map { "\t".$org->first_ch +ild_text($_)."\n" } qw(ORG_ID CAT_CODE GFM_CAT_CODE); print "\n\n"; }, }, ); $twig->parsefile( 'xmlpmonks.xml') or die; # build it __END__ #output 2754 72060793789255493 OR Oranization 10 72060793789255493 UN NOS 72060793789255508 OR Organization 25 72060793789255508 UN NOS 72060793789255510 OR Organization 50 72060793789255510 UN NOS
    hth

    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1060188]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (8)
As of 2024-03-28 09:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found