Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML::XPath - node-to-xpath reverse lookup

by georgeh (Initiate)
on Nov 08, 2010 at 05:20 UTC ( #870027=perlquestion: print w/ replies, xml ) Need Help??
georgeh has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use XML::XPath to extract interesting information from the tomcat url:

   /manager/status?XML=true

The interesting part of the output is:

<status>
  <connector name="http-0">
    <threadInfo currentThreadCount="0" currentThreadsBusy="0" maxThreads="200"/>
    <requestInfo bytesReceived="0" bytesSent="0" errorCount="0" maxTime="0" processingTime="0" requestCount="0"/>
    <workers></workers>
  </connector>
  <connector name="http-8080">
    <threadInfo currentThreadCount="158" currentThreadsBusy="10" maxThreads="200"/>
    <requestInfo bytesReceived="297" bytesSent="19350704517" errorCount="192504" maxTime="249349" processingTime="2242513592" requestCount="983650"/>
    <workers>
    </workers>
  </connector>
</status>

XML::XPath lets me find 'currentThreadsBusy' using an XPath construct like this:

   /status/connector/threadInfo/@currentThreadsBusy

The 'problem' is that the get_nodelist() method provides a list of nodes, which is OK, but I would really like to know the XPath to the specific nodes it finds. In this case, there are 2x 'connector' nodes, but there is no obvious way to figure out which 'connector' node a node belonged to, or even that it was the 'connector' nodes that were repeated (eg. it might have been 'threadInfo' ).

What I would really like is a way to query each node returned by get_nodelist(), and give me an XPath like this:

   /status[1]/connector[2]/threadInfo[1]/@currentThreadsBusy

This would uniquely identify the node where the data came from.

Q1. Is there some way XML::XPath will give me this information?

Q2. Is there another package (XML::Smart?) that will do this?

Sample code for testing:

#!/usr/bin/perl use strict; use XML::XPath; my $xml = ' <status> <connector name="http-0"> <threadInfo currentThreadCount="0" currentThreadsBusy="0" maxThrea +ds="200"/> <requestInfo bytesReceived="0" bytesSent="0" errorCount="0" maxTim +e="0" processingTime="0" requestCount="0"/> <workers></workers> </connector> <connector name="http-8080"> <threadInfo currentThreadCount="158" currentThreadsBusy="10" maxTh +reads="200"/> <requestInfo bytesReceived="297" bytesSent="19350704517" errorCoun +t="192504" maxTime="249349" processingTime="2242513592" requestCount= +"983650"/> <workers> </workers> </connector> </status> '; my $path = '/status/connector/threadInfo/@currentThreadsBusy'; my $xpath = XML::XPath->new( xml => $xml ); my $nodeset = $xpath->find($path); foreach my $node ($nodeset->get_nodelist) { print "FOUND: ", $xpath->getNodeText($node), "\n"; }

Comment on XML::XPath - node-to-xpath reverse lookup
Download Code
Re: XML::XPath - node-to-xpath reverse lookup
by aquarium (Curate) on Nov 08, 2010 at 05:46 UTC
    you can supply context for the xml::xpath module, which is restrict/guarantee your xpath expression is found at desired tree level.
    the hardest line to type correctly is: stty erase ^H

      Yes, I noticed that either of:

        /status/connector[attribute::name="http-8080"]/threadInfo/@maxThreads
        /status/connector[2]/threadInfo/@maxThreads

      Will work, but I was hoping the XPath library would give me, at least, the 2nd form of the above. Otherwise, it is difficult to determine where the data originated from.

      Thanks anyway.

Re: XML::XPath - node-to-xpath reverse lookup
by pajout (Curate) on Nov 08, 2010 at 17:03 UTC
    Do not beat me, this is my first experience with that module :>)

    #!/usr/bin/perl use strict; use XML::XPath; my $xml = ' <status> <some_node/> <connector name="http-0"> <threadInfo currentThreadsBusy="0"/> <requestInfo bytesReceived="0" bytesSent="0" errorCount="0" maxTim +e="0" processingTime="0" requestCount="0"/> <workers></workers> </connector> <connector name="http-8080"> <threadInfo currentThreadCount="158" currentThreadsBusy="10" maxTh +reads="200"/> <requestInfo bytesReceived="297" bytesSent="19350704517" errorCoun +t="192504" maxTime="249349" processingTime="2242513592" requestCount= +"983650"/> <workers> </workers> </connector> </status> '; my $path = '/status/connector/threadInfo/@currentThreadsBusy'; my $xpath = XML::XPath->new( xml => $xml ); my $nodeset = $xpath->find($path); foreach my $node ($nodeset->get_nodelist) { print "FOUND: ", $xpath->getNodeText($node), "\n"; my $ret = '/@currentThreadsBusy'; my $parent = $node->getParentNode(); while ($parent and $parent->getParentNode()) { #$ret = '['.$xpath->find('position()',$parent)->value.']' . $ret; $ret = '['.($xpath->find('preceding-sibling::*[name()="'.$parent-> +getName().'"]',$parent)->size+1).']' . $ret; $ret = '/'.$parent->getName() . $ret; $parent = $parent->getParentNode(); } print $ret."\n"; }
    I think that commented row which uses position() would work, but it does not...

      Thank you very much. This is exactly what I was looking for.

      My own XML::XPath knowledge is limited, and I could not have come with the $ret expression you've given here.

      Thanks again.

Re: XML::XPath - node-to-xpath reverse lookup
by Anonymous Monk on Nov 08, 2010 at 17:38 UTC
Re: XML::XPath - node-to-xpath reverse lookup
by choroba (Abbot) on Nov 11, 2010 at 15:42 UTC
    Or, using XML::XSH2:
    open 870027.xml ; for /status/connector/threadInfo/@currentThreadsBusy { echo (.) ; for (ancestor::*|.) { $n = concat(xsh:if(.=../@*,'@',''),name()) ; echo :s :n '/' $n ; if .=self::* echo :n :s '[' 1+count(./preceding-sibling::*[name()=$n]) +']' ; } echo ; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://870027]
Approved by aquarium
Front-paged by aquarium
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-12-27 14:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (177 votes), past polls