Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

XML parsing help need, please...

by spstansbury (Monk)
on Apr 13, 2010 at 20:13 UTC ( [id://834573]=perlquestion: print w/replies, xml ) Need Help??

spstansbury has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to extract some information from xml report generated by a trouble ticket system.

I have an "incident" number, and want to extract additional information from that node.

The script looks like this:

#!/usr/bin/perl use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; my $file = "./incident.xml"; my $parser = XML::LibXML->new(XML_LIBXML_RECOVER => 2); $parser->recover_silently(1); my $dom = $parser->load_xml( location => $file ); my $root = $dom->getDocumentElement(); my $xpc = XML::LibXML::XPathContext->new($root); ############################################## # If we have an incident ticket reference get info from that ticket. my $inc_node = $xpc->findnodes('/upload/incident[number/text() = "INC9 +02531"]'); my $service_restoration_time = $xpc->findvalue( "./u_service_restorati +on_time", $inc_node ); my $assignment_group = $xpc->findvalue( ".//assignment_group/\@display +_value", $inc_node ); # etc... print $service_restoration_time . "\n" ; print $assignment_group . "\n" ;

The XML file that the script parses looks like this (shortened, has thousands of incident entries. There is no namespace declared, not sure how to handle that...

I'm getting "XPathContext: lost current node at ./query_test.pl line 23" as the error.

<?xml version="1.0" encoding="UTF-8"?> <unload unload_date="2010-03-17 13:02:24"> <incident action="INSERT_OR_UPDATE"> <opened_at>2009-12-15 14:29:55</opened_at> <closed_at>2009-12-21 05:00:21</closed_at> <u_problem_summary/> <u_item>none</u_item> <business_duration>1970-01-03 05:30:05</business_duration> <category>none</category> <number>INC902550</number> <u_caller_location/> <u_service_restoration_time>2009-12-15 23:01:17</u_service_restoration +_time> <u_close_code>Service Restored</u_close_code> <u_service_loss_time>2009-12-15 14:29:55</u_service_loss_time> <u_user_issue>Software Issue/Bug</u_user_issue> <element_id>92becbd70a0a140b0187537b8516f18c</element_id> </unload>

Would greatly appreciate being pointed in the right direction!

Replies are listed 'Best First'.
Re: XML parsing help need, please...
by ikegami (Patriarch) on Apr 13, 2010 at 20:31 UTC
    findnodes (note the "s") returns a list of nodes (list context) or a NodeList object (scalar context). You want an actual node. If you want to discard all but the first result, you can use a list assignment:
       |         |
       v         v
    my ($inc_node) = $xpc->findnodes('/upload/incident[number/text()="INC902531"]');
    

    But your XPath is wrong. It should be

                                         |                                 |
                                         v                                 v
    my ($inc_node) = $xpc->findnodes('/unload/incident[number/text()="INC902550"]');
    

      Discarding all but the first works fine, as there should only be one match. Thank you for your help!

Re: XML parsing help need, please...
by ikegami (Patriarch) on Apr 13, 2010 at 20:38 UTC

    There is no namespace declared, not sure how to handle that...

    So the elements belong to the null namespace, which is the one XPath engines (should*) look into for a nodetest with no prefix.

    If all the elements are in the null namespace, you may use XPC as you used it

    my $xpc = XML::LibXML::XPathContext->new($root); $xpc->findnodes(...) $xpc->findnodes(..., $topic)

    or you may avoid using XPC completely.

    $root->findnodes(...) $root->findnodes(..., $topic)

    * — In practice, many XPath engine get this wrong. LibXML gets it right.

      Thanks for pointing out the option - I didn't realize that not using XPC would work

Re: XML parsing help need, please...
by amedico (Sexton) on Apr 13, 2010 at 21:23 UTC
    In the sample XML, the incident tag isn't closed - could be a problem if it's also the case in the real input (and not just a copy&paste error here).

      The latter - my apologies for the sloppy copying for the example

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://834573]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-24 22:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found