http://www.perlmonks.org?node_id=1067659

young_monk_love_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to parse an XML file retrieve the value of specific nodes. There's the XML structure :

<A > <B>B value</B> <C> <D d_attribute"d_attribute_value">D_Value</D> </C> </A > <A > <B>B value</B> <C> <D d_attribute"d_attribute_value">D_Value</D> </C> </A > . . .

Here's my code :

use strict; use Switch; use warnings; use XML::Parser; use XML::DOM::Lite qw(Parser :constants); my $parser = Parser->new(); my $doc; my $xml_path; $doc = $parser->parseFile($xml_path, whitespace => 'strip'); my $A_list = $doc->getElementsByTagName('A'); if(defined($A_list)){ my $length = scalar(@$A_list); for (my $cpt = 0; $cpt < $length; $cpt++) { my $B = $doc->selectNodes("A/B")->index($cpt)->nodeValue(); my $D = $doc->selectNodes("A/C/D")->index($cpt)->nodeValue(); print "- $B / $D\n"; } print "\n\n"; }

But I get an error : Can't call method "index" on unblessed reference.

I'm probably doing it the wrong way. Could you please advise me ? Additionally I would like to retrieve the attribute value.

Thank you for your help, and please be tolerant, I'm just here to seek for your wisdom, I'm still learning Perl ...

Regards Alex

Replies are listed 'Best First'.
Re: Retrieving an XML node value with XML::DOM::Lite and XML::Parser
by Eliya (Vicar) on Dec 18, 2013 at 18:43 UTC

    From a quick look at the source of XML/DOM/Lite/Document.pm, I'd say you need

    my $B = $doc->selectNodes("A/B")->[$cpt]->nodeValue(); my $D = $doc->selectNodes("A/C/D")->[$cpt]->nodeValue();

    i.e. selectNodes returns an arrayref, not an object.

    Other than that, you could make your loop look more "perly", by writing

    if(defined($A_list)){ for my $cpt (0..$#$A_list) { ... } print "\n\n"; }

    That said, wouldn't it be easier to just iterate over the arrays returned by selectNodes, and call ->nodeValue on each element, instead of selecting the elements by index from the (same) result list that you retrieve over and over again (the xpath query yields the same result on every iteration of the loop).

      As I replied to roboticus, I'm getting an error when I try to print the value, but using the debug statements roboticus provided, i can get an Array reference.

Re: Retrieving an XML node value with XML::DOM::Lite and XML::Parser
by roboticus (Chancellor) on Dec 18, 2013 at 18:23 UTC

    young_monk_love_perl:

    That message means that one of your calls to selectNodes isn't returning an object. You may want to add a couple debug statements like:

    my $TMP = $doc->selectNodes("A/B"); print "TMP: <", ref($TMP), ">"; $TMP = $doc->selectNodes("A/C/D"); print "TMP2: <", ref($TMP), ">";

    That may shed some light on the issue.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      The debug statement you suggested output some references like :

      ARRAY(0x94b3910)

      But when i try what Eliya suggested :

      my $B = $doc->selectNodes("A/B")->[$cpt]->nodeValue(); my $D = $doc->selectNodes("A/C/D")->[$cpt]->nodeValue(); print "$B / $D \n";

      I get :  Use of uninitialized value $B in concatenation (.) or string at line xxx and the same for $A for each iteration of the loop

        young_monk_love_perl:

        OK, that 'ARRAY(0xXXXXXXX)' means that the selectNodes function is returning an array reference. I'm guessing (based on a cursory glance at XML::DOM::Lite that it's returning a list of objects. Since your Xpath statements are different, there's no reason to expect that both lists of nodes are the same length, yet your code assumes that. You'll get a similar error for every iteration where your lists aren't the same length.

        I don't know exactly what you're trying to do, but perhaps this will give you an idea on how to handle it:

        my @list1 = $doc->selectNodes("A/B"); my @list2 = $doc->selectNodes("A/C/D"); while (@list1 or @list2) { my ($B,$D) = ('-EMPTY-','-EMPTY-'); if (@list1) { my $t = shift @list1; $B = $t->nodeValue(); } if (@list2) { my $t = shift @list1; $D = $t->nodeValue(); } print "$B / $D \n"; }

        Since the lists aren't necessarily the same length, it may even better to simply process the lists independently:

        print "B NODES:\n"; print "\t", $_->nodeValue(), "\n" for $doc->selectNodes("A/B"); print "\nD NODES:\n"; print "\t", $_->nodeValue(), "\n" for $doc->selectNodes("A/C/D");

        The error in Eliyas version is another indication that your lists are different length. When the B list ran out of values, the $B variable was left undef because you ran off the end of the array. (Unless there was actually an undef in the list.)

        In either case, you need to verify that the functions are returning what you expect to see, and when they don't, perform some suitable action.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Retrieving an XML node value with XML::DOM::Lite and XML::Parser
by tangent (Parson) on Dec 18, 2013 at 22:48 UTC
    XML::DOM::Lite is weird. As you can see from the code below you can get the attribute of each found node normally, but to get the text value you have to call childNodes() on the node first and then call nodeValue(). Heavy use of Data::Dumper was required to work this one out.
    my $B = $doc->selectNodes("A/B"); if (defined($B) ) { for my $node (@$B) { my $cnodes = $node->childNodes; for my $cnode (@$cnodes) { my $val = $cnode->nodeValue; print qq|Value of B is: $val\n|; } } } my $D = $doc->selectNodes("A/C/D"); if (defined($D) ) { for my $node (@$D) { my $attr = $node->getAttribute("d_attribute"); print qq|Attribute 'd_attribute' of D is: $attr\n|; my $cnodes = $node->childNodes; for my $cnode (@$cnodes) { my $val = $cnode->nodeValue; print qq|Value of D is: $val\n|; } } }
    On your sample XML, after I fixed the attribute (d_attribute="d_attribute_value") this produces:
    Value of B is: B value Value of B is: B value Attribute 'd_attribute' of D is: d_attribute_value Value of D is: D_Value Attribute 'd_attribute' of D is: d_attribute_value Value of D is: D_Value
Re: Retrieving an XML node value with XML::DOM::Lite and XML::Parser
by tangent (Parson) on Dec 18, 2013 at 23:31 UTC
    I would recommend that you switch to XML::LibXML. This is how you would do the same thing using that module. Note that I had to add a Root tag to surround the XML as it will give an error otherwise. Note also the different Xpath expressions used as a result.
    use XML::LibXML; my $string = q| <Root> <A > <B>B value</B> <C> <D d_attribute="d_attribute_value">D_Value</D> </C> </A > <A> <B>B value</B> <C> <D d_attribute="d_attribute_value">D_Value</D> </C> </A> </Root> |; my $doc = XML::LibXML->load_xml(string => $string); my @B = $doc->findnodes("//A/B"); if (@B) { for my $node (@B) { my $val = $node->textContent; print qq|Value of B is: $val\n|; } } my @D = $doc->findnodes("//A/C/D"); if (@D) { for my $node (@D) { my $attr = $node->getAttribute("d_attribute"); print qq|Attribute 'd_attribute' of D is: $attr\n|; my $val = $node->textContent; print qq|Value of D is: $val\n|; } }
Re: Retrieving an XML node value... switch to XML::Twig
by Discipulus (Canon) on Dec 19, 2013 at 09:25 UTC
    As TIMTOWTDT me too I suggest to switch module: to XML::Twig. As noted by tangent you need a Root node, and you need the equal sign between attributes and their values.

    I'm not an expert, but please look at the linearity of handlers (subs called during the parsing, called when an xpath match) usage:
    use warnings; use strict; use XML::Twig; my $xml=<<'XML'; <Root> <A > <B>B value</B> <C> <D d_attribute="d_attribute_value">D_Value</D> </C> </A > <A> <B>B value</B> <C> <D d_attribute="d_attribute_value">D_Value</D> </C> </A> </Root> XML my $twig= new XML::Twig( pretty_print => 'indented', twig_handlers => { '/Root/A/B' => \&field_B, '/Root/A/C/D' => \&field_ +D, }, ); $twig->parse( $xml); sub field_B { my( $twig, $field)= @_; print 'Value of B is: '.$field->text."\n"; } sub field_D { my( $twig, $field)= @_; print 'Value of D is: '.$field->text."\n"; }
    hth
    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      Thank you very much, it works ! I used XML::Twig which seems to fit better with the XML file I'm parsing.