http://www.perlmonks.org?node_id=531313


in reply to Re: Namespaced XML::LibXML XPath query
in thread Namespaced XML::LibXML XPath query

The behaviour you saw is absolutely correct and not a bug at all. To quote the author of libxml2 from a message aptly titled Re: [xml] XPath and default namespaces (bet you're sick of this by now :) ):

You cannot define a default namespace for XPath, period, don't try you can't, the XPath spec does not allow it. This can't work and trying to add it to libxml2 would simply make it non conformant to the spec.

In a nutshell forget about using default namespace within XPath expressions, this will *never* work, you *can't* !

Google [daniel veillard default namespace xpath] if you want more.

As he says, XPath has no notion of a default namespace. //lastName in an XPath expression always matches that element in the null namespace, not the default namespace. According to the spec:

A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded).

In //sdnList:lastName, sdnList is not a namespace. Only URIs can be namespaces. The stuff in front of the colon is the prefix, and is merely a stand-in for the URI. <sdnList xmlns="http://tempuri.org/sdnList.xsd"> puts the sdnList element (and all its prefix-less descendants) in the http://tempuri.org/sdnList.xsd namespace. You have to associate this URI with a prefix, then use the prefix in your expression. This is exactly the approach lestrrat posted:

my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ); $xc->registerNs( foobar => 'http://tempuri.org/sdnList.xsd' ); my $result = $xc->findvalue( '//foobar:lastName' );

I wrote about this a while ago.

Note that the prefix is arbitrary and has nothing to do with what appears in your document. This is as it should be, because the following document means exactly the same as the one you have:

<camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <camel:sdnEntry> <camel:lastName>Hello world!</camel:lastName> </camel:sdnEntry> </camel:sdnList>

For that matter, even this means the same:

<camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <camel:lastName>Hello world!</camel:lastName> </penguin:sdnEntry> </camel:sdnList>

Or this:

<sdnList xmlns="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <lastName>Hello world!</lastName> </penguin:sdnEntry> </sdnList>

You get the idea.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^3: Namespaced XML::LibXML XPath query (not a bug)
by jbfamilly (Initiate) on Oct 23, 2008 at 10:42 UTC
    Hi Monks, I must be missing something simple. Could you please help me grasp this concept...
    Take the following example xml:
    <aaa xmlns="xmlapi_1.0"> <bbb> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> </bbb> </aaa>
    I need to iterate through each <ccc>. I worked out how to get the list of <ccc> nodes and this thread confirms what I did as correct. But now that I have the <ccc> node, how do I get the <dx> properties? I've tried with and without the namespace already defined but still no love. It gets worse, the xml I receive could have <e> nested in <d>.

      OK, here's a standalone example that might help:

      #!/usr/bin/perl use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ) +; $xc->registerNs( xapi => 'xmlapi_1.0' ); foreach my $ccc ($xc->findnodes('//xapi:ccc')) { print "Found a ccc\n"; foreach my $d2 ( $xc->findnodes('./xapi:d2', $ccc) ) { print " d2 element contained: '" . $d2->to_literal . "'\n"; } if(my $animal = $xc->findvalue('./xapi:zoo/xapi:critter', $ccc) ) +{ print " The mystery animal is '$animal'\n"; } } exit; __DATA__ <aaa xmlns="xmlapi_1.0"> <bbb> <ccc> <d1>blah d1a</d1> <d2>blah d2a</d2> <d3>blah d3a</d3> <zoo> <critter>Monkey</critter> </zoo> </ccc> <ccc> <d1>blah d1b</d1> <d2>blah d2b</d2> <d3>blah d3b</d3> <zoo> <critter>Giraffe</critter> </zoo> </ccc> </bbb> </aaa>

      So the key point is that if you want to match an element that has a namespace (explicit via a prefix or inherited from a parent element) then you must include the namespace when you refer to the element in your XPath expression.

      When matching a namespace, the only thing that matters is the URI. The prefix used in the source document (if there was one) is irrelevant. The prefix used in your code when you register the namespace URI is also irrelevant. What matters is that your XPath query includes a prefix that has been registered to associate it with the same URI as the namespace declared in the source document.

        Thanks.
        I had tried:
        foreach my $d2 ( $ccc->findnodes('./xapi:d2') ) {
        instead of:
        foreach my $d2 ( $xc->findnodes('./xapi:d2', $ccc) ) {