http://www.perlmonks.org?node_id=784625

khalistoo has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am currently parsing an XML file with a script using XML::LibXML. so far i ve been able to get the data outputed as i wanted to, but the script escapes when it reaches an empty tag. Now, i am a pure noob when it comes to perl (and quite a lot of other stuff lol) but I ve tried to read few articles, posts, forums... and as far as i can get i need to test if a node contains data, which i am suppose to do with hasChildNode function. Fair enough, I just don t understand where to use, or how... Here is a piece of code and an XML file
Parsing.pl sub parse{ my $parser = XML::LibXML->new(); my $tree = $parser->parse_file($file); open (my $FhResultat, '>', $FichierResultat ); my $root = $tree->getDocumentElement; my @productname = $root->getElementsByTagName('product'); foreach my $child (@productname){ print {$FhResultat} $child->getElementsByTagName('name')->[0]->getFirstChild->getD +ata, "\t", $child->getAttribute('category_id'),"\t", $child->getAttribute('id'), "\t", $child->getElementsByTagName('desc_short')->[0]->getFirstChild +->getData, "\n"; }
products.xml <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Children's Hand Rake</name> <imageURL_med></imageURL_med> <desc_short>Mini gardeners can dig, rake and scoop out the +ir own plot with this children's hand rake, complete with contoured h +andles and durable metal heads.</desc_short> </product>
Since a normal product will have information in <imageURL_med></imageURL_med> i would like to know how i can code my script to fetch data inside this tag, and return null if there is no data and having the script not stopping when it encounters an empty tag. Thanks a lot in advance

Replies are listed 'Best First'.
Re: XML::LibXML- Escape Empty Tags
by Your Mother (Archbishop) on Jul 30, 2009 at 16:36 UTC

    This might help get you going. I took out the file stuff, you'll have to adjust. If this is something you actually need for work, you might consider posting it as a one-off job to jobs.perl.org or something.

    use strict; # Don't leave out! use warnings; # Don't leave out! use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my @product = $doc->getElementsByTagName('product'); for my $kid ( @product ){ print join("\t", $kid->getElementsByTagName('name')->[0]->textContent, $kid->getElementsByTagName('imageURL_med')->[0]->textCont +ent, $kid->getAttribute('category_id'), $kid->getAttribute('id'), $kid->getElementsByTagName('desc_short')->[0]->textConten +t, ), "\n"; } # print $doc->serialize(); __END__ <root> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Children's Hand Rake</name> <imageURL_med></imageURL_med> <desc_short>Mini gardeners can dig, rake and scoop out their own p +lot with this children's hand rake, complete with contoured handles a +nd durable metal heads.</desc_short> </product> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Bag of Broken Glass</name> <imageURL_med>http://moocow.co.uk.jp/something/something/bg.jpg</i +mageURL_med> <desc_short>Fun for all ages!</desc_short> </product> </root>
      Thanks a lot, this seems to work. However, can you explain to me two things, the line
      # print $doc->serialize();
      As i ve got no idea about what it is suppose to actually do. and the use of my $doc = $parser->parse_fh(\*DATA);. I guess this is to work with filehandle but i was under the impression that parse_file was much better for big file manipulation (since i am in fact using to parse some 600 meg XML...), but then again, thanks a lot, that s going on very well. Cheers everyone for the help

        The serialize is there to uncomment if you want it to dump the doc to check. And you're right, doing the file directly (no filehandle) is probably faster. The *DATA handle is just easy to test/demo because it lets you put the data into the test script. Good luck. It is worth the effort to continue to pick up some Perl. It's not that hard, you'll get great help here and on many lists, and it can boost productivity in a menagerie of tasks.

        It is a comment, it does nothing :D
Re: XML::LibXML- Escape Empty Tags
by ramrod (Curate) on Jul 30, 2009 at 14:35 UTC
    What all have you tried? I would start with
    $node->textContent;
    or
    $tree->findnodes("//imageURL_med/text()");
      I don't think the second will work, because I don't think there's a text node to match.
      I have no idea as to where i need to implement this line, can you just develop or explain a bit more this path, i would try it if at least i knew where to try :) Sorry i am completly lost.
        So I guess you didn't try anything?
        It seemed from the code you posted that you were somewhat familiar with XML::LibXML.
        Anyway, if you want to access text try (in your loop):
        $child->textContent;
        Or if you want to get the text node try (anywhere after you parse the file):
        $tree->findnodes("//imageURL_med/text()");
Re: XML::LibXML- Escape Empty Tags
by Anonymous Monk on Jul 31, 2009 at 14:42 UTC
    I have used XML::Simple module to do it, It can take Xml file name as input can will create data structure out of it. Just thought it can be useful to you