Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
There's more than one way to do things.
 
PerlMonks

XML::Twig problem

by BenHopkins (Acolyte)
 | Log in | Create a new user | The Monastery Gates | Super Search | 
 | Seekers of Perl Wisdom | Meditations | PerlMonks Discussion | 
 | Obfuscation | Reviews | Cool Uses For Perl | Perl News | Q&A | Tutorials | 
 | Poetry | Recent Threads | Newest Nodes | Donate | What's New | 

on May 31, 2007 at 00:11 UTC ( #618360=perlquestion: print w/ replies, xml ) Need Help??
BenHopkins has asked for the wisdom of the Perl Monks concerning the following question:

I have a little program based on the twig's doc "Building An XML Filter." It makes roots for the things I need to process, and uses twig_print_outside_roots. However, the output is NOT valid xml (the input is). Here's how the input starts:

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd" [ <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; ]>
Here's the output:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE nitf SYSTEM "../CCI-DTD/nitf-3-1.dtd"> <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for X +HTML//EN" "../CCI-DTD/xhtml-lat1.ent"> %HTMLlat1; <!ENTITY % HTMLsymbol PUBLIC "-//W3C//ENTITIES Symbols for + XHTML//EN" "../CCI-DTD/xhtml-symbol.ent"> %HTMLsymbol; <!ENTITY % HTMLspecial PUBLIC "-//W3C//ENTITIES Special fo +r XHTML//EN" "../CCI-DTD/xhtml-special.ent"> %HTMLspecial; >
The square brackets surrounding the three !ENTITY declarations are missing. Here's the program's new declaration:
my $t = XML::Twig->new( twig_roots => { "$nitf_root/body/body.head/hedline/hl1" => \&f +ix_hl1, "$nitf_root/body/body.head/hedline/hl2" => \&f +ix_hl2, }, twig_print_outside_roots => 1, keep_encoding => 1, );
At first I didn't have keep_encoding, and then besides the missing square brackets, the first !ENTITY was also missing. keep_encoding restored the first !ENTITY, but not the brackets.

Any ideas?

(I do a flush after the parse, so it's not that, althought I don't see how it could affect anything, I saw something about it.)

Comment on XML::Twig problem
Select or Download Code
Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 06:34 UTC

    The development version of XML::Twig (at http://xmltwig.com/xmltwig nearly fixes this: the brackets are there, but the first entity declaration is not output properly (it comes out as <!ENTITY HTMLlat1 SYSTEM "../CCI-DTD/xhtml-lat1.ent" PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" >). The other 2 are output properly, which is quite puzzling. This occurs whether keep_encoding is used or not. I'll fix it and report back.

    I have refactored the code that outputs the internal DTD in the new version, so tests are welcome. Apparently my test suite did not cover this case, so I will add a test too.

    Thanks for the info.

[reply]
[d/l]
Re: XML::Twig problem
by mirod (Canon) on May 31, 2007 at 11:23 UTC

    OK, it looks like I did not process properly parameter entities.

    It's fixed in the development version, let me know if it works for you.

[reply]
      It works. Thanks. But (you knew there would be a but, didn't you?), in the output, there was some trailing text after the final tag, which went away when I took out the flush() call.

      Also, when I tried to verify the soundness of the outputted XML with xml_pp (mine, not yours), it got this error:Undefined subroutine &Text::Wrap::wrap called at /usr/local/perl/5.8.2/lib/site_perl/5.8.2/XML/Twig.pm line 7476. When I replaced indented_c with nice, it worked.

[reply]
[d/l]

        Indeed the flush messes things up when you're using twig_print_outside_roots. I should check for that, I'll see what I can do. In fact, with recent versions of the module, the flush after the end of the parsing is no longer needed. The module assumes that if you started flushing, then you want to keep on doing it (or you would most likely get non well-formed XML), so at the end of the parse, if flush has been used on the twig, it performs a last flush, using the filehandle that was used for the first flush. It DWIMs better that it reads ;--(.

        As for the xml_pp problem, I don't know, maybe you redefined the constants and you ended up using the one for 'wrapped' or 'cvs' instead of the one for 'indented_c' ? What's the value you are using for the style? BTW I usually use xmlwf, xmllint or perl -MXML::Parser -e'XML::Parser->new( ErrorContext => 1)->parsefile( shift())' file.xml to check the well-formedness of the XML.

[reply]
[d/l]

Back to Seekers of Perl Wisdom


Login:
Password
remember me
What's my password?
Create A New User

Node Status
node history
Node Type: perlquestion [id://618360]
Approved by Moriarty
help
Community Ads
Chatterbox
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users
Others about the Monastery: (15)
BrowserUk
GrandFather
borisz
jdporter
holli
atcroft
Khen1950fx
herveus
thezip
Eyck
ssandv
perlsyntax
gnosti
wanradt
Gri
As of 2009-11-21 04:28 GMT
Sections
The Monastery Gates
Seekers of Perl Wisdom
Meditations
PerlMonks Discussion
Categorized Q&A
Tutorials
Obfuscated Code
Perl Poetry
Cool Uses for Perl
Perl News
Information
PerlMonks FAQ
Guide to the Monastery
What's New at PerlMonks
Voting/Experience System
Tutorials
Reviews
Library
Perl FAQs
Other Info Sources
Find Nodes
Nodes You Wrote
Super Search
List Nodes By Users
Newest Nodes
Recently Active Threads
Selected Best Nodes
Best Nodes
Worst Nodes
Saints in our Book
Leftovers
The St. Larry Wall Shrine
Offering Plate
Awards
Craft
Snippets Section
Code Catacombs
Quests
Editor Requests
Buy PerlMonks Gear
PerlMonks Merchandise
Planet Perl
Perlsphere
Use Perl
Perl.com
Perl 5 Wiki
Perl Jobs
Perl Mongers
Perl Directory
Perl documentation
CPAN
Random Node
Voting Booth

Future historians will find that the material characteristic of the current era is...

Aluminium
Plastic
Oil
Water
Carbon dioxide
Copper
Iron
Silicon
Salt
Uranium
Hydrogen
Other

Results (726 votes), past polls