http://www.perlmonks.org?node_id=1003089

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a bunch of XML files from my GPS, and I'd like to extract data from them, and play around with them, displaying them graphically for one. As it's done properly, it has its own schema, and uses its own namespace. One such file can be found at www.dehulst.nl/Garmin/TCX/1832.tcx

So, in order to parse such an XML file, you have to register it with XML::LibXML (using a variable $string as the prefix)

my $parser = XML::LibXML->new->parse_file($file); my $xml = XML::LibXML::XPathContext->new($parser); $xml->registerNs($string,'http://www.garmin.com/xmlschemas/TrainingCen +terDatabase/v2');
Now, extracting the value of an attribute poses no problem:
for my $key ($xml->findnodes('//x:Lap')) { $string = norm_date($key->findvalue("\@StartTime")); }
Just in case anyone is worried, there (usually) is only one Lap per file.

The problems start when I want to extract the timestamps from each recorded datapoint. My first try was

for my $node ($xml->findnodes('//y:Trackpoint')) { $time = $node->findvalue("Time"); push @X,$time; }
This fails. It does find the set of nodes, but fails to find the timestamp that's in the Time element. Experimenting with the examples I Googled, I found the following, which does give me the timestamps:
for my $node ($xml->findnodes("//y:Trackpoint")) { for my $ch ($node->childNodes) { $time = norm_date($ch->textContent)-$epoch if $ch->nodeName =~ /Time +/; } push @X,$time; }
That's nasty. It does work, but it's nasty.

So, the question is: why does the findvalue function fail to work with a non-default namespace? Or am I missing something?

Replies are listed 'Best First'.
Re: XML::LibXML and namespaces
by choroba (Cardinal) on Nov 09, 2012 at 09:51 UTC
    See the documentation of XML::LibXML::Node under findnodes. If you follow the advice given there, your XPath will work:
    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $file = '1832.tcx'; my $parser = XML::LibXML->new->parse_file($file); my $xml = XML::LibXML::XPathContext->new; # No argum +ent here! $xml->registerNs('x', 'http://www.garmin.com/xmlschemas/TrainingCenterDatab +ase/v2'); for my $key ($xml->findnodes('//x:Lap', $parser)) { # Provide +the $parser here. my $string = $key->findvalue('@StartTime'); # No $pars +er needed, since attributes are namespaceless. print "1\t$string\n"; } my @X; for my $node ($xml->findnodes('//x:Trackpoint', $parser)) { # Again, $ +parser as argument. my $time = $xml->findvalue('x:Time', $node); # Context +specified as argument. push @X, $time; } print "@X\n";

    Update: comments.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Ah! That's what I was missing. I did try to put a context in, but I tried $xml, not $node.

      Thanks!

Re: XML::LibXML and namespaces
by Anonymous Monk on Nov 09, 2012 at 11:30 UTC

    libxml can be quite a PITAPITCT - Pain In The Carpal Tunnel :)

    #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $doc = XML::LibXML->new()->parse_string( q{<?xml version='1.0' ?> <roshambo xmlns="http://example.com/roshambo"> <sham> <bo name="40" /> <bo name="2" /> </sham> <sham> <bo name="forty" /> <bo name="two" /> </sham> </roshambo> } ); for my $node ( $doc->F( '//x:sham' ) ) { print "@{[ $node->nodePath ]}\n"; for my $name ( $node->F( 'x:bo/@name' ) ) { print "@{[ $name->nodePath ]} @{[$name->nodeValue]}\n"; } print "\n\n"; } $::xpc->registerNs( 'y', 'http://example.com/roshambo' ); print $doc->F('y:roshambo'); exit( 0 ); BEGIN { $::xpc = XML::LibXML::XPathContext->new( ); $::xpc->registerNs( 'x', 'http://example.com/roshambo' ); sub XML::LibXML::Node::F { my( $self, $xpath, $context ) = @_; $::xpc->findnodes( $xpath, $context || $self ); } } __END__

    Also if you're interested in a fancier nodePath, see XPATH DOM traverse html/xml

Re: XML::LibXML and namespaces
by Lotus1 (Vicar) on Nov 09, 2012 at 13:29 UTC

    You can also give a relative path to the node in findvalue. The '.' means the current node so if "Time" is a child of the node at $node :

    for my $node ($xml->findnodes('//y:Trackpoint')) { $time = $node->findvalue("./Time"); push @X,$time; }

    If there are more than one "Time" nodes under $node then findvalue() will concatenate the text from all and return it.

      Wouldn't you know it :) xpath allows ignoring namespaces by using functions name and local-name, and the current node (.) comes in handy , heady even
      #!/usr/bin/perl -- use strict; use warnings; use XML::LibXML; my $doc = XML::LibXML->new()->parse_string( q{<?xml version='1.0' ?> <roshambo xmlns="http://example.com/roshambo"> <sham> <bo name="40" /> <bo name="2" /> </sham> <sham xmlns:ftt="http://example.com/roshambo"> <ftt:bo name="forty" /> <ftt:bo name="two" /> </sham> </roshambo> } ); for my $name ( $doc->findnodes( q{//*[local-name()="bo"]/@name} ) ) { printf "%-25s %s\n", $name->nodePath, $name->nodeValue; } print "\n\n"; for my $node ( $doc->findnodes( q{//*[name()="sham"]} ) ) { print "@{[ $node->nodePath ]}\n"; ## any children ## ./* ## any descendants ## .//* ## anywhere ## //* for my $name ( $node->findnodes( q{./*[local-name()="bo"]/@name} ) + ) { printf "%-25s %s\n", $name->nodePath, $name->nodeValue; } print "\n\n"; } __END__ /*/*[1]/*[1]/@name 40 /*/*[1]/*[2]/@name 2 /*/*[2]/ftt:bo[1]/@name forty /*/*[2]/ftt:bo[2]/@name two /*/*[1] /*/*[1]/*[1]/@name 40 /*/*[1]/*[2]/@name 2 /*/*[2] /*/*[2]/ftt:bo[1]/@name forty /*/*[2]/ftt:bo[2]/@name two
        Awesome! Exactly what I needed.
        I'm getting myself hopelessly confused with something similar. My XML looks like this:
        <user> <address name="1"> <entry name="Address line 1">street</entry> <entry name="Address line 2">suburb</entry> <entry name="Postal code">code</entry> </address> <address name="2"> <entry name="Address line 1">street2</entry> <entry name="Address line 2">suburb2</entry> <entry name="Postal code">code2</entry> </address> </user>
        How can I retrieve suburb2 from that?
      that won't work if namespaces are in play, you have to use XPathContext

        Thanks, that's good to know. I haven't dealt with namespaces yet.