Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

LibXML, XPath and Namespaces

by space_monk (Chaplain)
on Mar 21, 2013 at 14:42 UTC ( [id://1024765]=perlquestion: print w/replies, xml ) Need Help??

space_monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow Monks,

I am in receipt of an xml message. What is the best way to retrieve the xmlns (namespace) attribute from the root element of that message using XML::LibXML?? If it helps, I am expecting GovTalkMessages of the format:

<?xml version="1.0"?> <GovTalkMessage xmlns="http://www.govtalk.gov.uk/CM/envelope"> <EnvelopeVersion>2.0</EnvelopeVersion> <Header> <MessageDetails> ..... </MessageDetails> </Header> <GovTalkDetails> ..... </GovTalkDetails> <Body> <!-- A valid Body payload with a namespace declaration on the first el +ement --> </Body> </GovTalkMessage>
For bonus points, please tell me if there is any way of avoiding using XPathContext and adding the namespace identifier to all xpath queries on the resulting DOM. e.g. do I really have to do:
my $dom = XML::LibXML->load_xml( string => $xml); my $xc = XML::LibXML::XPathContext->new($dom); $xc->registerNs( 'gt', 'http://www.govtalk.co.uk/CM/envelope'); $envelopeVersion = $xc->findvalue( '//gt:EnvelopeVersion');
A Monk aims to give answers to those who have none, and to learn from those who know more.

Replies are listed 'Best First'.
Re: LibXML, XPath and Namespaces
by choroba (Cardinal) on Mar 21, 2013 at 15:27 UTC
    You can retrieve the namespace by the namespaceURI method:
    my $doc = XML::LibXML->load_xml(location => '1.xml'); my $root = $doc->documentElement; my $nsuri = $root->namespaceURI;

    You have to register the namespace before you can use it. It is annoying, but it is how XML works: if no namespace is specified, it means the empty namespace, not a default one (imagine the main:: package in Perl having no name).

    If you find XML::LibXML too verbose (as I do), you might like XML::XSH2 which is a simple wrapper around it which reduces most of the incantations. Nevertheless, you still have to

    register-namespace gt http://www.govtalk.gov.uk/CM/envelope ;
    .
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thanks very much,

      As you say I was really trying to find a way of getting round the verbosity. Unfortunately getting more libraries installed round here is awkward and quite time-consuming so I have to live with LibXML.

      A Monk aims to give answers to those who have none, and to learn from those who know more.
Re: LibXML, XPath and Namespaces
by tobyink (Canon) on Mar 21, 2013 at 21:32 UTC

    Can I have my bonus points please??

    use v5.10; use strict; use warnings; use XML::LibXML; my $xml = XML::LibXML->load_xml(IO => \*DATA); say "The root element's namespace is: ", $xml->documentElement->namespaceURI; # Give that namespace a prefix so that we can reference it in XPath $xml->documentElement->setNamespaceDeclPrefix("", "gt"); say "Look! The new prefix works! Found: ", $xml->findvalue( '//gt:EnvelopeVersion'); __DATA__ <?xml version="1.0"?> <GovTalkMessage xmlns="http://www.govtalk.gov.uk/CM/envelope"> <EnvelopeVersion>2.0</EnvelopeVersion> <Header> <MessageDetails> ..... </MessageDetails> </Header> <GovTalkDetails> ..... </GovTalkDetails> <Body> <!-- A valid Body payload with a namespace declaration on the first el +ement --> </Body> </GovTalkMessage>
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

      What I really wanted to achieve was for the system to assume the default namespace was 'gt' so I didn't have to include it in the prefix in all XPath expressions.

      It's fine when it's just one level deep e.g. EnvelopeVersion, but when you want to pick up a number of nodes 3 or 4 levels deep and keep having to repeat that 'gt:' at every level its a PITA.

      I did mod your reply up for the effort though :-)

      A Monk aims to give answers to those who have none, and to learn from those who know more.

        "What I really wanted to achieve was for the system to assume the default namespace was 'gt' so I didn't have to include it in the prefix in all XPath expressions."

        Well, that would break XPath spec compliance. As per the XPath spec, node names with no colon always reference nodes with no namespace at all.

        Otherwise, if you could somehow set "gt" to be the default namespace for XPaths, you wouldn't be able to distinguish between the following two attributes:

        <gt:foo gt:bar="1" bar="2" />

        "It's fine when it's just one level deep e.g. EnvelopeVersion, but when you want to pick up a number of nodes 3 or 4 levels deep and keep having to repeat that 'gt:' at every level its a PITA."

        I enjoy golf as much as the next man, but is three characters per name really so bad? (You could always bind the namespace to just "g" so it was two characters.) I saved you having to construct XML::LibXML::XPathContext objects, didn't I??

        If your XPaths are fairly simple, you could take a look at XML::LibXML::QuerySelector which allows you to select nodes using CSS selectors. I wrote it for use with (X)HTML, but I don't see any reason it shouldn't roughly work with arbitrary XML.

        package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

        That wouldn't be a valid XPath. XPath "foo" matches child elements named "foo" in the null namespace. There's no way to specify a default namespace for nodetests in an XPath.

        Furthermore, gt is a prefix, not a namespace. http://www.govtalk.gov.uk/CM/envelope is the namespace in this case. gt is completely arbitrary, meaningless.

Re: LibXML, XPath and Namespaces (name(), local-name())
by Anonymous Monk on Mar 21, 2013 at 22:23 UTC

      Both of the above actually increase the level of verbosity in the XPath queries. It's more of a fault of XPath than it is Perl, so I guess one has to live with its eccentricities.

      Thanks for the extra info

      A Monk aims to give answers to those who have none, and to learn from those who know more.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1024765]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-03-19 04:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found