Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

XML::LibXML and namespaces

by Anonymous Monk
on Nov 09, 2012 at 09:29 UTC ( #1003089=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a bunch of XML files from my GPS, and I'd like to extract data from them, and play around with them, displaying them graphically for one. As it's done properly, it has its own schema, and uses its own namespace. One such file can be found at www.dehulst.nl/Garmin/TCX/1832.tcx

So, in order to parse such an XML file, you have to register it with XML::LibXML (using a variable $string as the prefix)

my $parser = XML::LibXML->new->parse_file($file); my $xml = XML::LibXML::XPathContext->new($parser); $xml->registerNs($string,'http://www.garmin.com/xmlschemas/TrainingCen +terDatabase/v2');
Now, extracting the value of an attribute poses no problem:
for my $key ($xml->findnodes('//x:Lap')) { $string = norm_date($key->findvalue("\@StartTime")); }
Just in case anyone is worried, there (usually) is only one Lap per file.

The problems start when I want to extract the timestamps from each recorded datapoint. My first try was

for my $node ($xml->findnodes('//y:Trackpoint')) { $time = $node->findvalue("Time"); push @X,$time; }
This fails. It does find the set of nodes, but fails to find the timestamp that's in the Time element. Experimenting with the examples I Googled, I found the following, which does give me the timestamps:
for my $node ($xml->findnodes("//y:Trackpoint")) { for my $ch ($node->childNodes) { $time = norm_date($ch->textContent)-$epoch if $ch->nodeName =~ /Time +/; } push @X,$time; }
That's nasty. It does work, but it's nasty.

So, the question is: why does the findvalue function fail to work with a non-default namespace? Or am I missing something?

Comment on XML::LibXML and namespaces
Select or Download Code
Re: XML::LibXML and namespaces
by choroba (Abbot) on Nov 09, 2012 at 09:51 UTC
    See the documentation of XML::LibXML::Node under findnodes. If you follow the advice given there, your XPath will work:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $file = '1832.tcx'; my $parser = XML::LibXML->new->parse_file($file); my $xml = XML::LibXML::XPathContext->new; # No argum +ent here! $xml->registerNs('x', 'http://www.garmin.com/xmlschemas/TrainingCenterDatab +ase/v2'); for my $key ($xml->findnodes('//x:Lap', $parser)) { # Provide +the $parser here. my $string = $key->findvalue('@StartTime'); # No $pars +er needed, since attributes are namespaceless. print "1\t$string\n"; } my @X; for my $node ($xml->findnodes('//x:Trackpoint', $parser)) { # Again, $ +parser as argument. my $time = $xml->findvalue('x:Time', $node); # Context +specified as argument. push @X, $time; } print "@X\n";

    Update: comments.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Ah! That's what I was missing. I did try to put a context in, but I tried $xml, not $node.

      Thanks!

Re: XML::LibXML and namespaces
by Anonymous Monk on Nov 09, 2012 at 11:30 UTC

    libxml can be quite a PITAPITCT - Pain In The Carpal Tunnel :)

    #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $doc = XML::LibXML->new()->parse_string( q{<?xml version='1.0' ?> <roshambo xmlns="http://example.com/roshambo"> <sham> <bo name="40" /> <bo name="2" /> </sham> <sham> <bo name="forty" /> <bo name="two" /> </sham> </roshambo> } ); for my $node ( $doc->F( '//x:sham' ) ) { print "@{[ $node->nodePath ]}\n"; for my $name ( $node->F( 'x:bo/@name' ) ) { print "@{[ $name->nodePath ]} @{[$name->nodeValue]}\n"; } print "\n\n"; } $::xpc->registerNs( 'y', 'http://example.com/roshambo' ); print $doc->F('y:roshambo'); exit( 0 ); BEGIN { $::xpc = XML::LibXML::XPathContext->new( ); $::xpc->registerNs( 'x', 'http://example.com/roshambo' ); sub XML::LibXML::Node::F { my( $self, $xpath, $context ) = @_; $::xpc->findnodes( $xpath, $context || $self ); } } __END__

    Also if you're interested in a fancier nodePath, see XPATH DOM traverse html/xml

Re: XML::LibXML and namespaces
by Lotus1 (Chaplain) on Nov 09, 2012 at 13:29 UTC

    You can also give a relative path to the node in findvalue. The '.' means the current node so if "Time" is a child of the node at $node :

    for my $node ($xml->findnodes('//y:Trackpoint')) { $time = $node->findvalue("./Time"); push @X,$time; }

    If there are more than one "Time" nodes under $node then findvalue() will concatenate the text from all and return it.

      that won't work if namespaces are in play, you have to use XPathContext

        Thanks, that's good to know. I haven't dealt with namespaces yet.

      Wouldn't you know it :) xpath allows ignoring namespaces by using functions name and local-name, and the current node (.) comes in handy , heady even
      #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $doc = XML::LibXML->new()->parse_string( q{<?xml version='1.0' ?> <roshambo xmlns="http://example.com/roshambo"> <sham> <bo name="40" /> <bo name="2" /> </sham> <sham xmlns:ftt="http://example.com/roshambo"> <ftt:bo name="forty" /> <ftt:bo name="two" /> </sham> </roshambo> } ); for my $name ( $doc->findnodes( q{//*[local-name()="bo"]/@name} ) ) { printf "%-25s %s\n", $name->nodePath, $name->nodeValue; } print "\n\n"; for my $node ( $doc->findnodes( q{//*[name()="sham"]} ) ) { print "@{[ $node->nodePath ]}\n"; ## any children ## ./* ## any descendants ## .//* ## anywhere ## //* for my $name ( $node->findnodes( q{./*[local-name()="bo"]/@name} ) + ) { printf "%-25s %s\n", $name->nodePath, $name->nodeValue; } print "\n\n"; } __END__ /*/*[1]/*[1]/@name 40 /*/*[1]/*[2]/@name 2 /*/*[2]/ftt:bo[1]/@name forty /*/*[2]/ftt:bo[2]/@name two /*/*[1] /*/*[1]/*[1]/@name 40 /*/*[1]/*[2]/@name 2 /*/*[2] /*/*[2]/ftt:bo[1]/@name forty /*/*[2]/ftt:bo[2]/@name two
        Awesome! Exactly what I needed.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1003089]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2014-08-29 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (280 votes), past polls