Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

xml::libxml open, add and save not formatting properly

by itsscott (Acolyte)
on Mar 23, 2010 at 22:21 UTC ( #830411=perlquestion: print w/ replies, xml ) Need Help??
itsscott has asked for the wisdom of the Perl Monks concerning the following question:

Good day Monks!

I have a config xml file that I actually created with the xml lib so I know it's valid ;-) It is nicely formatted with nice indents etc as toString(1) does so well.
I wanted to add an entry to it so I parsed it in, added to it and saved it again with toString(1)... Oddly the elements we all on one line and not pretty like the rest of the document. Here is a brief example.
my ($configuri) = $basepath."config.xml"; my $cparser = XML::LibXML->new(); # load config file my $config = $cparser->parse_file($configuri); # then an xpath to get to where I want to be foreach my $xsites ($lrconfig->findnodes("//linkrabbitconfig/sites")) { $xnewsite = $config->createElement('site'); $xsites->appendChild($xnewsite); $xsitename = $config->createElement('sitename'); $xnewsite->addChild($xsitename); $xsitename->addChild( $config->createCDATASection($args{'site'}) ) +; # add a bunch more elements } # output xml file open (XMLfile,">".$configuri); binmode(XMLfile,":utf8"); autoflush XMLfile 1; chmod 0664, $outfile; # dump the xml document to file print XMLfile $config->toString(1); close(XMLfile);
Any help would be very much appreciated, I am thinking that it might have a something to do with a xml object and a dom object? Am I close?

Thank you in advance!
Scott

Comment on xml::libxml open, add and save not formatting properly
Download Code
Re: xml::libxml open, add and save not formatting properly
by ikegami (Pope) on Mar 23, 2010 at 22:53 UTC
    Don't you want ->toString(2)?

    chmod 0664, $outfile;

    that's not the right variable name. Aren't you using use strict; use warnings;?

    autoflush XMLfile 1;

    Useless, since closing a file handle flushes it.

    binmode(XMLfile,":utf8");

    That's a bug. "on document nodes [toString] returns the XML as a byte string in the original encoding of the document". You're double encoding. You want

    # Switch to UTF-8 if it's not already. $config->setEncoding('UTF-8'); open(my $config_fh, ">", $configuri) or die $!; binmode($config_fh); print $config_fh $config->toString(2); close($config_fh); chmod 0664, $configuri;
    or better yet:
    # Switch to UTF-8 if it's not already. $config->setEncoding('UTF-8'); $config->toFile($configuri, 2); chmod 0664, $configuri;
      Thanks for the quick response and information, I did make all the changes you recommended and it did not make a difference (please forgive any 'code' errors on the example, I had to extract it from our code and re-create it for the question due to a non-disclosure agreement.
      As you can see, the first 'site' is nice, and the one I just added in my test is not (in fact the </sites> has also lost it's linefeed in the process.
      <?xml version="1.0" encoding="UTF-8"?> <config> <sites> <site> <sitename><![CDATA[www.example.com]]></sitename> <active><![CDATA[1]]></active> <rooturl><![CDATA[http://www.example.com.com/]]></rooturl> <name><![CDATA[Example]]></name> </site> <site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[ +http://www.test.com.com/]]></rooturl><reportname><![CDATA[test report + name]]></reportname></site></sites> </config>
      Again, this is just a small section of many entries in this file.

        The catch is that what you're asking to do involves changing the logical structure of the XML document by adding significant spaces, and XML::LibXML sees toString as a serialization function.

        and it did not make a difference

        I just tried it. It makes a huge difference. Not for the good, though. While it pretties up the part that isn't prettied up, it pretties up the part that's already been prettied up too.

        use strict; use warnings; use XML::LibXML; print XML::LibXML->new->parse_fh(*DATA)->toString(2); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <config> <sites> <site> <sitename><![CDATA[www.example.com]]></sitename> <active><![CDATA[1]]></active> <rooturl><![CDATA[http://www.example.com.com/]]></rooturl> <name><![CDATA[Example]]></name> </site> <site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[ +http://www.test.com.com/]]></rooturl><reportname><![CDATA[test report + name]]></reportname></site></sites> </config>
        ?xml version="1.0" encoding="UTF-8"?> <config> <sites> <site> <sitename> <![CDATA[www.example.com]]> </sitename> <active> <![CDATA[1]]> </active> <rooturl> <![CDATA[http://www.example.com.com/]]> </rooturl> <name> <![CDATA[Example]]> </name> </site> <site> <sitename> Test entry </sitename> <name/> <rooturl> <![CDATA[http://www.test.com.com/]]> </rooturl> <reportname> <![CDATA[test report name]]> </reportname> </site> </sites> </config>
Re: xml::libxml open, add and save not formatting properly
by gam3 (Curate) on Mar 23, 2010 at 23:10 UTC
    You might look at XML::Bare or XML::ED as they are very simple.
    -- gam3
    A picture is worth a thousand words, but takes 200K.
      Looks interesting, unfortunately this projects requirements are to only use xml::libxml Thanks for the feedback!
        Too bad that you can't use something like XML::Tidy. I tried it, and it returned this:
        <?xml version="1.0" encoding="utf-8"?> <config> <sites> <site> <sitename>www.example.com</sitename> <active>1</active> <rooturl>http://www.example.com.com/</rooturl> <name>Example</name> </site> <site> <sitename>Test entry</sitename> <name /> <rooturl>http://www.test.com.com/</rooturl> <reportname>test report name</reportname> </site> </sites> </config>
        The code that I used:
        #!/usr/bin/perl use strict; use warnings; use XML::Tidy; my $tidy_obj = XML::Tidy->new( 'filename' => '/path/to/xmlfile'); $tidy_obj->tidy(); $tidy_obj->write();
Re: xml::libxml open, add and save not formatting properly (pretty printing with libxml)
by ikegami (Pope) on Mar 24, 2010 at 00:54 UTC
    aha!!
    use strict; use warnings; use XML::LibXML qw( ); my $parser = XML::LibXML->new(); $parser->keep_blanks(0); print $parser->parse_fh(*DATA)->toString(@ARGV ? $ARGV[0] : 1); __DATA__ <?xml version="1.0" encoding="UTF-8"?> <config> <sites> <site> <sitename><![CDATA[www.example.com]]></sitename> <active><![CDATA[1]]></active> <rooturl><![CDATA[http://www.example.com.com/]]></rooturl> <name><![CDATA[Example]]></name> </site> <site><sitename>Test entry</sitename><name></name><rooturl><![CDATA[ +http://www.test.com.com/]]></rooturl><reportname><![CDATA[tes$ </config>
    <?xml version="1.0" encoding="UTF-8"?> <config> <sites> <site> <sitename><![CDATA[www.example.com]]></sitename> <active><![CDATA[1]]></active> <rooturl><![CDATA[http://www.example.com.com/]]></rooturl> <name><![CDATA[Example]]></name> </site> <site> <sitename>Test entry</sitename> <name/> <rooturl><![CDATA[http://www.test.com.com/]]></rooturl> <reportname><![CDATA[test report name]]></reportname> </site> </sites> </config>
      Bingo (bowing humbly) Thank you so much the $parser->keep_blanks(0); fixed the problem perfectly, thank you so much for your input! (dancing)
Re: xml::libxml open, add and save not formatting properly
by dHarry (Abbot) on Mar 24, 2010 at 11:01 UTC

    As a side note, not related to your question.

    I have a config xml file that I actually created with the xml lib so I know it's valid ;-)

    You probably mean "well-formed", "valid" means something else in XML context:P

    Out of curiosity, why do you use CDATA? If I go through your example data I see no reason to use it. The CDATA mechanism was thought up to let you quote fragments of text containing markup characters. But it doesn't really work that well. One of the biggest strengths of XML is the data validation capability (I'm thinking XMLSchema here). Putting stuff in like CDATA, by definition ignored by the parser, doesn't help in that respect.

    Cheers

    Harry

      Ok, well-formed it is! I'm all self taught in all of this world so thanks for the correct term to be using in this situation.

      As for the CDATA, this tool is for crawling and analysis of our clients web sites. I'm sure we all have experienced that a large portion of sites are at best, poorly built on a technical side, and ampersands and other markup characters are often in links, titles and other elements that we collect. I suppose I could check each entry to see if it contains a markup character and CDATA only the ones I need to.

      Thanks for the input, I find that often I get confused with documentation, I was never one for being able to understand it, I am a much more hands on kind of learner, albeit, that costs it's own time and frustration, but if my mind doesn't grok it, I have to code and try it until I eventually do get it!
        Instead of
        sub text_to_xml { my $s = shift; $s =~ s/]]>/]]>]]&gt;<![CDATA[/g; return "<![CDATA[$xml]]>"; }
        you could use
        use HTML::Entities qw( encode_entities ); sub text_to_xml { return encode_entities($text, '<&'); }

      Putting stuff in like CDATA, by definition ignored by the parser, doesn't help in that respect.

      Doesn't hurt either. The following three lines are identical from the point of view of an XML parser:

      <![CDATA[http://www.example.com.com/]]> &#104;ttp://www.example.com.com/ http://www.example.com.com/

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://830411]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-09-21 04:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls