Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Namespaced XML::LibXML XPath query

by diotalevi (Canon)
on Feb 15, 2006 at 22:09 UTC ( [id://530540]=note: print w/replies, xml ) Need Help??


in reply to Namespaced XML::LibXML XPath query

I believe I have solved this. It is a bug in either the xml parser that XML::LibXML uses or XML::LibXML. When a namespace declaration doesn't specify a prefix, the prefix used is the containing element name. For my example code, the prefix should be sdnList. XML::LibXML is of the incorrect opinion that sdnList isn't a valid namespace. My query should have been written as //sdnList:lastName.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Replies are listed 'Best First'.
Re^2: Namespaced XML::LibXML XPath query (not a bug)
by Aristotle (Chancellor) on Feb 20, 2006 at 00:53 UTC

    The behaviour you saw is absolutely correct and not a bug at all. To quote the author of libxml2 from a message aptly titled Re: [xml] XPath and default namespaces (bet you're sick of this by now :) ):

    You cannot define a default namespace for XPath, period, don't try you can't, the XPath spec does not allow it. This can't work and trying to add it to libxml2 would simply make it non conformant to the spec.

    In a nutshell forget about using default namespace within XPath expressions, this will *never* work, you *can't* !

    Google [daniel veillard default namespace xpath] if you want more.

    As he says, XPath has no notion of a default namespace. //lastName in an XPath expression always matches that element in the null namespace, not the default namespace. According to the spec:

    A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with xmlns is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded).

    In //sdnList:lastName, sdnList is not a namespace. Only URIs can be namespaces. The stuff in front of the colon is the prefix, and is merely a stand-in for the URI. <sdnList xmlns="http://tempuri.org/sdnList.xsd"> puts the sdnList element (and all its prefix-less descendants) in the http://tempuri.org/sdnList.xsd namespace. You have to associate this URI with a prefix, then use the prefix in your expression. This is exactly the approach lestrrat posted:

    my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ); $xc->registerNs( foobar => 'http://tempuri.org/sdnList.xsd' ); my $result = $xc->findvalue( '//foobar:lastName' );

    I wrote about this a while ago.

    Note that the prefix is arbitrary and has nothing to do with what appears in your document. This is as it should be, because the following document means exactly the same as the one you have:

    <camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <camel:sdnEntry> <camel:lastName>Hello world!</camel:lastName> </camel:sdnEntry> </camel:sdnList>

    For that matter, even this means the same:

    <camel:sdnList xmlns:camel="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <camel:lastName>Hello world!</camel:lastName> </penguin:sdnEntry> </camel:sdnList>

    Or this:

    <sdnList xmlns="http://tempuri.org/sdnList.xsd"> <penguin:sdnEntry xmlns:penguin="http://tempuri.org/sdnList.xsd"> <lastName>Hello world!</lastName> </penguin:sdnEntry> </sdnList>

    You get the idea.

    Makeshifts last the longest.

      Hi Monks, I must be missing something simple. Could you please help me grasp this concept...
      Take the following example xml:
      <aaa xmlns="xmlapi_1.0"> <bbb> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> <ccc> <d1>blah</d1> <d2>blah</d2> <d3>blah</d3> </ccc> </bbb> </aaa>
      I need to iterate through each <ccc>. I worked out how to get the list of <ccc> nodes and this thread confirms what I did as correct. But now that I have the <ccc> node, how do I get the <dx> properties? I've tried with and without the namespace already defined but still no love. It gets worse, the xml I receive could have <e> nested in <d>.

        OK, here's a standalone example that might help:

        #!/usr/bin/perl use strict; use warnings; use XML::LibXML; use XML::LibXML::XPathContext; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() ) +; $xc->registerNs( xapi => 'xmlapi_1.0' ); foreach my $ccc ($xc->findnodes('//xapi:ccc')) { print "Found a ccc\n"; foreach my $d2 ( $xc->findnodes('./xapi:d2', $ccc) ) { print " d2 element contained: '" . $d2->to_literal . "'\n"; } if(my $animal = $xc->findvalue('./xapi:zoo/xapi:critter', $ccc) ) +{ print " The mystery animal is '$animal'\n"; } } exit; __DATA__ <aaa xmlns="xmlapi_1.0"> <bbb> <ccc> <d1>blah d1a</d1> <d2>blah d2a</d2> <d3>blah d3a</d3> <zoo> <critter>Monkey</critter> </zoo> </ccc> <ccc> <d1>blah d1b</d1> <d2>blah d2b</d2> <d3>blah d3b</d3> <zoo> <critter>Giraffe</critter> </zoo> </ccc> </bbb> </aaa>

        So the key point is that if you want to match an element that has a namespace (explicit via a prefix or inherited from a parent element) then you must include the namespace when you refer to the element in your XPath expression.

        When matching a namespace, the only thing that matters is the URI. The prefix used in the source document (if there was one) is irrelevant. The prefix used in your code when you register the namespace URI is also irrelevant. What matters is that your XPath query includes a prefix that has been registered to associate it with the same URI as the namespace declared in the source document.

Re^2: Namespaced XML::LibXML XPath query
by acid06 (Friar) on Feb 16, 2006 at 14:33 UTC
    I've just checked the specifications of how it should be and it is, indeed, a bug. Although I don't know if it's a libxml2 bug or a bug in the Perl bindings to it (i.e. XML::LibXML).

    Either way, you should report it to the authors. But I don't know if it's still maintained, since the last update happened in 2004.


    acid06
    perl -e "print pack('h*', 16369646), scalar reverse $="

      I reported this to rt.cpan.org as soon as I found that it was a bug.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://530540]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 01:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found