Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

XML::LibXML- Escape Empty Tags

by khalistoo (Initiate)
on Jul 30, 2009 at 14:23 UTC ( #784625=perlquestion: print w/ replies, xml ) Need Help??
khalistoo has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I am currently parsing an XML file with a script using XML::LibXML. so far i ve been able to get the data outputed as i wanted to, but the script escapes when it reaches an empty tag. Now, i am a pure noob when it comes to perl (and quite a lot of other stuff lol) but I ve tried to read few articles, posts, forums... and as far as i can get i need to test if a node contains data, which i am suppose to do with hasChildNode function. Fair enough, I just don t understand where to use, or how... Here is a piece of code and an XML file
Parsing.pl sub parse{ my $parser = XML::LibXML->new(); my $tree = $parser->parse_file($file); open (my $FhResultat, '>', $FichierResultat ); my $root = $tree->getDocumentElement; my @productname = $root->getElementsByTagName('product'); foreach my $child (@productname){ print {$FhResultat} $child->getElementsByTagName('name')->[0]->getFirstChild->getD +ata, "\t", $child->getAttribute('category_id'),"\t", $child->getAttribute('id'), "\t", $child->getElementsByTagName('desc_short')->[0]->getFirstChild +->getData, "\n"; }
products.xml <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Children's Hand Rake</name> <imageURL_med></imageURL_med> <desc_short>Mini gardeners can dig, rake and scoop out the +ir own plot with this children's hand rake, complete with contoured h +andles and durable metal heads.</desc_short> </product>
Since a normal product will have information in <imageURL_med></imageURL_med> i would like to know how i can code my script to fetch data inside this tag, and return null if there is no data and having the script not stopping when it encounters an empty tag. Thanks a lot in advance

Comment on XML::LibXML- Escape Empty Tags
Select or Download Code
Re: XML::LibXML- Escape Empty Tags
by ramrod (Hermit) on Jul 30, 2009 at 14:35 UTC
    What all have you tried? I would start with
    $node->textContent;
    or
    $tree->findnodes("//imageURL_med/text()");
      I have no idea as to where i need to implement this line, can you just develop or explain a bit more this path, i would try it if at least i knew where to try :) Sorry i am completly lost.
        So I guess you didn't try anything?
        It seemed from the code you posted that you were somewhat familiar with XML::LibXML.
        Anyway, if you want to access text try (in your loop):
        $child->textContent;
        Or if you want to get the text node try (anywhere after you parse the file):
        $tree->findnodes("//imageURL_med/text()");
      I don't think the second will work, because I don't think there's a text node to match.
Re: XML::LibXML- Escape Empty Tags
by Your Mother (Canon) on Jul 30, 2009 at 16:36 UTC

    This might help get you going. I took out the file stuff, you'll have to adjust. If this is something you actually need for work, you might consider posting it as a one-off job to jobs.perl.org or something.

    use strict; # Don't leave out! use warnings; # Don't leave out! use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_fh(\*DATA); my @product = $doc->getElementsByTagName('product'); for my $kid ( @product ){ print join("\t", $kid->getElementsByTagName('name')->[0]->textContent, $kid->getElementsByTagName('imageURL_med')->[0]->textCont +ent, $kid->getAttribute('category_id'), $kid->getAttribute('id'), $kid->getElementsByTagName('desc_short')->[0]->textConten +t, ), "\n"; } # print $doc->serialize(); __END__ <root> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Children's Hand Rake</name> <imageURL_med></imageURL_med> <desc_short>Mini gardeners can dig, rake and scoop out their own p +lot with this children's hand rake, complete with contoured handles a +nd durable metal heads.</desc_short> </product> <product category_id="13296" id="675936193" catalog="false" row="1"> <name>Bag of Broken Glass</name> <imageURL_med>http://moocow.co.uk.jp/something/something/bg.jpg</i +mageURL_med> <desc_short>Fun for all ages!</desc_short> </product> </root>
      Thanks a lot, this seems to work. However, can you explain to me two things, the line
      # print $doc->serialize();
      As i ve got no idea about what it is suppose to actually do. and the use of my $doc = $parser->parse_fh(\*DATA);
      . I guess this is to work with filehandle but i was under the impression that parse_file was much better for big file manipulation (since i am in fact using to parse some 600 meg XML...), but then again, thanks a lot, that s going on very well. Cheers everyone for the help
        It is a comment, it does nothing :D

        The serialize is there to uncomment if you want it to dump the doc to check. And you're right, doing the file directly (no filehandle) is probably faster. The *DATA handle is just easy to test/demo because it lets you put the data into the test script. Good luck. It is worth the effort to continue to pick up some Perl. It's not that hard, you'll get great help here and on many lists, and it can boost productivity in a menagerie of tasks.

Re: XML::LibXML- Escape Empty Tags
by Anonymous Monk on Jul 31, 2009 at 14:42 UTC
    I have used XML::Simple module to do it, It can take Xml file name as input can will create data structure out of it. Just thought it can be useful to you

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://784625]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-07-23 23:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (154 votes), past polls